Frequent Itemset Mining Using Rough-Sets

Usman Qamar; Younus Javed

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32797

Frequent Itemset Mining Using Rough-Sets

Authors: Usman Qamar, Younus Javed

Abstract:

Frequent pattern mining is the process of finding a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set. It was proposed in the context of frequent itemsets and association rule mining. Frequent pattern mining is used to find inherent regularities in data. What products were often purchased together? Its applications include basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click stream) analysis, and DNA sequence analysis. However, one of the bottlenecks of frequent itemset mining is that as the data increase the amount of time and resources required to mining the data increases at an exponential rate. In this investigation a new algorithm is proposed which can be uses as a pre-processor for frequent itemset mining. FASTER (FeAture SelecTion using Entropy and Rough sets) is a hybrid pre-processor algorithm which utilizes entropy and roughsets to carry out record reduction and feature (attribute) selection respectively. FASTER for frequent itemset mining can produce a speed up of 3.1 times when compared to original algorithm while maintaining an accuracy of 71%.

Keywords: Rough-sets, Classification, Feature Selection, Entropy, Outliers, Frequent itemset mining.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1096467

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2389

References:

[1] R. Agrawal, T. Imielinski, Mining Association Rules between Sets of Items in Large Databases. SIGMOD 1993, pp. 207-216.
[2] S. Chai, J. Yang, Y. Cheng, The Research of Improved Apriori Algorithm for Mining Association Rules, International Conference on Service Systems and Service Management, 2007, pp. 1-4.
[3] J. Liang, Y. Qian, Information granules and entropy theory in information systems, Science in China Series F: Information Sciences, Vol. 51, 2008, pp. 1427-1444.
[4] Pawlak. Rough Sets: Theoretical Aspects of Reasoning About Data. Dordrecht: Kluwer Academic. 1991.
[5] Li-Juan, L. Zhou-Jun, A novel rough set approach for classification, IEEE International Conference on Granular Computing, 2006, pp. 349- 352.
[6] C. Hung, H. Purnawan, B,Kuo, Multispectral image classification using rough set theory and the comparison with parallelepiped classifier, Geoscience and Remote Sensing Symposium, 2007. IGARSS 2007. IEEE International, pp. 2052-2055.
[7] R. Jensen and Q. Shen. Fuzzy-rough data reduction with ant colony optimization. Fuzzy Sets Systems, vol. 149, Issue No. 1, 2005, pp. 5–20.
[8] Zengyou H, Xiaofei Xu, An Optimization Model for Outlier Detection in Categorical Data, Lecture Notes in Computer Science, Volume 3644, 2005, pp. 400-409.
[9] UCL Machine Learning Group.