Association Rules Mining and NOSQL Oriented Document in Big Data
Authors: Sarra Senhadji, Imene Benzeguimi, Zohra Yagoub
Abstract:
Big Data represents the recent technology of manipulating voluminous and unstructured data sets over multiple sources. Therefore, NOSQL appears to handle the problem of unstructured data. Association rules mining is one of the popular techniques of data mining to extract hidden relationship from transactional databases. The algorithm for finding association dependencies is well-solved with Map Reduce. The goal of our work is to reduce the time of generating of frequent itemsets by using Map Reduce and NOSQL database oriented document. A comparative study is given to evaluate the performances of our algorithm with the classical algorithm Apriori.
Keywords: Apriori, Association rules mining, Big Data, data mining, Hadoop, Map Reduce, MongoDB, NoSQL.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 700References:
[1] Rakesh Agrawal, Tomasz Imielinski, and ArunSwami. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD international conference on Management of data - SIGMOD '93. ACM Press, 1993.
[2] Idrees Al-Hashemi. Applying data mining technique over big data. PhD thesis, August 2013.
[3] AprilReeve. Managing data in motion: data integration bestpractice techniques and technologies. Morgan Kaufmann, 2013.
[4] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: a fexible data processing tool. Communications of the ACM, 53(1):72_77, 2010.
[5] Y. Djenouri, D. Djenouri, J. C. Lin, and A. Belhadi. Frequent itemset mining in big data with effective single scan algorithms. IEEE Access, 6:68013_68026, 2018.
[6] Cheikh Kacfah Emani, Nadine Cullot, and Christophe Nicolle. Understandable big data: a survey. Computer science review, 17:70_81, 2015.
[7] Pascal Hitzler and Krzysztof Janowicz. Linked Data, Big Data, and the 4th Paradigm. Semantic Web, 4(3):233_235, 2013.
[8] Brad Brown Jacques Bughin Richard Dobbs Charles Roxburgh James Manyika, Michael Chui and Angela Hung Byers. Big Data: The Next Frontier Forinnovation, Competition, And Productivity. Technical report, McKinsey Global Institute, 2011.
[9] Nawsher Khan, Mohammed Alsaqer, Habib Shah, Gran Badsha, Aftab Ahmad Abbasi, and Soulmaz Salehian. The 10 vs, issues and challenges of big data. In Proceedings of the 2018 International Conference on Big Data and Education - ICBDE '18. ACM Press, 2018.
[10] Ming-Yen Lin, Pei-Yu Lee, and Sue-Chen Hsueh. Apriori-based frequent itemset mining algorithms on MapReduce. In Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication - ICUIMC '12. ACM Press, 2012.
[11] Xueyan Lin. MR-apriori: association rules algorithm based on MapReduce. In 2014 IEEE 5th International Conference on Software Engineering and Service Science. IEEE, June 2014.
[12] Andreas Meier and Michael Kaufmann. NoSQL databases. In SQL & NoSQL Databases, pages 201_218. Springer Fachmedien Wiesbaden, 2019.
[13] https://nikinfotech.wordpress.com/category/bigdata/.
[14] Pasquier, Nicolas, et al. "Discovering frequent closed itemsets for association rules. International Conference on Database Theory. Springer, Berlin, Heidelberg, 1999.
[15] Rafael Peixoto, Hassan Thomas, Christophe Cruz, Aurélie Bertaux, and Nuno Silva. Semantic HMC for Business Intelligence using Cross-Referencing. In 14th International Conference on Informatics in Economy, Bucharest, Romania, April 2015.
[16] Agrawal Rakesh and Ramakrishman Srikant. Fast algorithms for mining association rules in large databases. In VLDB, 1994.
[17] Nataliya Shakhovska, Roman Kaminskyy, Eugen Zasoba, and Mykola Tsiutsiura. Association rules mining in big data. International Journal of Computing, 17(1):25_32, 2018.
[18] Michael Stonebraker. SQL databases v. NoSQL databases. Communications of the ACM, 53(4):10, apr 2010.
[19] VMWare. https://my.vmware.com/en/web/vmware/downloads.
[20] Eclipse. https://www.eclipse.org/.
[21] Frequent Itemset Mining Dataset Repository. http://fimi.uantwerpen.be/data/.
[22] https://hadoop.apache.org/.
[23] MongoBD. Available: http://www.mongodb.org/.
[24] https://whatsbigdata.be/mapreduce/
[25] Sudhakar Singh, Rakhi Garg, and PK Mishra. Review of apriori based algorithms on mapreduce framework. arXiv preprint arXiv:1702.06284, 2017.
[26] Davis, D. Patterson, Ethics of Big Data: Balancing Risk andInnovation, O’Reilly Media, 2012.