A Distributed Approach to Extract High Utility Itemsets from XML Data

S. Kannimuthu; K. Premalatha

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32797

A Distributed Approach to Extract High Utility Itemsets from XML Data

Authors: S. Kannimuthu, K. Premalatha

Abstract:

This paper investigates a new data mining capability that entails mining of High Utility Itemsets (HUI) in a distributed environment. Existing research in data mining deals with only presence or absence of an items and do not consider the semantic measures like weight or cost of the items. Thus, HUI mining algorithm has evolved. HUI mining is the one kind of utility mining concept, aims to identify itemsets whose utility satisfies a given threshold. Although, the approach of mining HUIs in a distributed environment and mining of the same from XML data have not explored yet. In this work, a novel approach is proposed to mine HUIs from the XML based data in a distributed environment. This work utilizes Service Oriented Computing (SOC) paradigm which provides Knowledge as a Service (KaaS). The interesting patterns are provided via the web services with the help of knowledge server to answer the queries of the consumers. The performance of the approach is evaluated on various databases using execution time and memory consumption.

Keywords: Data mining, Knowledge as a Service, service oriented computing, utility mining.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1091770

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2403

References:

[1] Agrawal, R., Imielinski, T., Swami, A.N., "Mining association rules between sets of items in large databases”, In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, DC, 1993, pp.207–216.
[2] Yao H and Hamilton J, Mining itemset utilities from transaction databases, Data &Knowledge Engineering 59, 2006, pp. 603-626.
[3] Uzair Ahmad, Mohammad Waseem Hassan, Arshad Ali, Richard McClatchey and Ian Willers, An Integrated Approach for Extraction of Objects from XML and Transformation to Heterogeneous Object Oriented Databases, Int. Conf. on Enterprise Information Systems (ICEIS) , 2003, pp. 445-449.
[4] W3C Consortium, http://www.w3.org, 2006
[5] Xyleme. http://www.xyleme.com.
[6] LucieXyleme. A dynamic warehouse for XML data of the web. IEEE Data Engineering Bulletin, 2001.
[7] Liu, Y., Liao, W.K., and Choudhary, A., "A two-phase algorithm for fast discovery of high utility itemsets”, In Proc.of the Pacific Asia Conference on Knowledge Discovery andData Mining (PAKDD), 2005.
[8] Mengchi Liu and JunfengQu, "Mining High Utility Itemsets without Candidate Generation” In Proc. of the 21st ACM international conference on Information and knowledge management, 2012, pp. 55-64.
[9] Ming-Syan Chen, Jong Soo Park and Philip S. Yu, "Efficient Data Mining for Path Traversal Patterns”, IEEE Transactions on Knowledge and Data Engineering, Vol. 10, No. 2, 1998.
[10] http://www.cs.waikato.ac.nz/ml/weka/
[11] http://rapid-i.com/content/view/181/190/
[12] http://orange.biolab.si/
[13] DimitriosGeorgakopoulos and Michael P. Papazoglou, Service-Oriented Computing, The MIT PressCambridge, MassachusettsLondon, England, 2009.
[14] Hong Yao, Howard J. Hamilton, and Cory J. Butz.: "A Foundational Approach to Mining Itemset Utilities from Databases”, In Proceedings of the 3rd SIAM International Conference on Data Mining, Orlando, Florida, 2004, pp. 482-486.
[15] Chan R, Yang Q and Shen Y.: "Mining high-utility itemsets”. In Proc. of the 3rd IEEE International Conference on Data Mining (ICDM’ 03). Melbourne, FL, 2003, pp. 19–26.
[16] Ying Liu, Wei-keng Liao, and AlokChoudhary: "A Fast High Utility Itemsets Mining Algorithm”, UBDM’2005, 2005, pp. 90-99.
[17] Ying Liu, Wei-keng Liao, and AlokChoudhary: "A two-phase algorithm for fast discovery of high utility itemsets”, In 9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD 2005), Lecture Notes in Computer Science, vol. 3518, Springer-Verlag, Berlin, 2005, pp. 689–695.
[18] Zaki, M. J., Ogihara, M., Parthasarathy, S., and Li W. "Parallel Data Mining for Association Rules on Shared-memory Multi-processors. In Proceedings of the ACM/IEEE conference on Supercomputing, Pittsburg”, PA, 1996.
[19] Zaki, M. J. "Parallel and Distributed Association Mining: A Survey. IEEE Concurrency, Special issue on Parallel Mechanisms for Data Mining”, Vol. 7, No. 4, 1999, pp. 4-25.
[20] Hong Yao, Howard J. Hamilton.: "Mining itemset utilities from transaction databases, Data & KnowledgeEngineering”, Elsevier Journal, Vol. 59, 2006, pp. 603-626.
[21] Hong Yao, Howard J. Hamilton, LiqiangGeng. : "A Unified Framework for Utility Based Measures for Mining Itemsets”, Proceedings of the Second International Workshop on Utility-Based Data Mining, 2006, pp.28-37.
[22] Bruckhaus, T., Ling, C.X., Madhavji, N.H., and Sheng, S.: "Software Escalation Prediction with Data Mining”. Workshop on Predictive Software Models, A STEP Software Technology & Engineering Practice, 2004.
[23] Vincent S. Tseng, Chun-Jung Chu, Tyne Liang. : "Efficient Mining of Temporal High Utility Itemsets from Data streams”, In Proceedings of the Second International Workshop on Utility-Based Data Mining, 2006, pp. 18-27.
[24] Ying Liu, Wei-keng Liao, and AlokChoudhary: "A Fast High Utility Itemsets Mining Algorithm”, UBDM’2005, 2005, pp. 90-99.
[25] Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R. (editors). : "Advances in Knowledge Discovery and Data Mining”, AAAI/MIT Press, 1996.
[26] Jianying Hu, Aleksandra Mojsilovic.: "High-utility pattern mining: A method for discovery of high-utility item sets”, Elsevier Journal of Pattern recognition, Vol. 40, 2007, pp. 3317 – 3324.
[27] Alva Erwin, Raj P. Gopalan, N.R. Achuthan.: "CTU-Mine: An Efficient High Utility Itemset Mining Algorithm Using the Pattern Growth Approach”, In Proceedings of 7thInternational Conference on Computer and Information Technology, 2007, pp. 71-76.
[28] Alva Erwin, Raj P. Gopalan, N.R. Achuthan.: "A Bottom-Up Projection Based Algorithm for Mining High Utility Itemsets”, 2nd Workshop on Integrating AI and Data Mining (AIDM 2007), 2007, pp. 3-11.
[29] Show-Jane Yen and Yue-Shi Lee,: "Mining High Utility Quantitative Association Rules”, DaWaK 2007, Lecture Notes on Computer Science, Springer-Verlag, 2007, pp. 283-292.
[30] Yu-Chiang Li, Jieh-Shan Yeh, Chin-Chen Chang, "Isolated items discarding strategy for discovering high utility itemsets”, Data and Knowledge Engineering, Elsevier Journal, vol. 64, 2008, pp. 198-217.
[31] Shankar.S, Dr.Purusothaman.T, Jayanthi.S.: "Novel Algorithm for Mining High Utility Itemsets”, In Proceedings of the 2008 International Conference on Computing, Communication and Networking (ICCCN 2008), 2008, pp. 1-6.
[32] Hong Yao, Howard J. Hamilton, LiqiangGeng. : "A Unified Framework for Utility Based Measures for Mining Itemsets”, Proceedings of the Second International Workshop on Utility-Based Data Mining, 2006, pp.28-37.
[33] Ying Liu, Wei-keng Liao, and AlokChoudhary: "A two-phase algorithm for fast discovery of high utility itemsets”, In 9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD 2005), Lecture Notes in Computer Science, vol. 3518, Springer-Verlag, Berlin, 2005, pp. 689–695.
[34] Yu-Chiang Li, Jieh-Shan Yeh, Chin-Chen Chang, "Isolated items discarding strategy for discovering high utility itemsets”, Data and Knowledge Engineering, Elsevier Journal, vol. 64, pp. 198-217.
[35] Guo-Cheng Lan, Tzung-Pei Hong, Vincent S. Tseng.: "Mining On-shelf High Utility Itemsets”, International Conference on Information Technology and Applications in Outlying Islands, 2009, pp. 482-489.
[36] Jianying Hu, Aleksandra Mojsilovic.: "High-utility pattern mining: A method for discovery of high-utility item sets”, Elsevier Journal of Pattern recognition, Vol. 40, 2007, pp. 3317 – 3324.
[37] Vincent S. Tseng, Chun-Jung Chu, Tyne Liang. : "Efficient Mining of Temporal High Utility Itemsets from Data streams”, In Proceedings of the Second International Workshop on Utility-Based Data Mining, 2006, pp. 18-27.
[38] Chu, C.-J., Tseng, V.S., Liang, T.: "An Efficient mining for mining temporal high utility itemsets from data streams”. Journal of Systems and Software, Vol. 81, 2008, pp. 1105–1117.
[39] ChowdhuryFarhan Ahmed, Syed KhairuzzamanTanbeer, Byeong-SooJeong, and Young-Koo Lee.: "Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases, IEEE Transactions on Knowledge and Data Engineering”, Vol. 21, no. 12, 2009, pp. 1708-1721.
[40] Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie and Philip S. Yu. : "UP-Growth: An Efficient Algorithm for High Utility Itemset Mining”, In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 2010, pp. 253-262.
[41] Hua-Fu Li.: "MHUI-max: An efficient algorithm for discovering high-utility itemsets from data streams”, Journal of Information Science, vol.37, no. 5, 2011, pp. 532-545.
[42] Lee, K.L.,: "Efficient Graph-Based Algorithms for Discovering and Maintaining Knowledge in Large Databases”, NTHU Master Thesis, 1997.
[43] Vid Podpecan, Monika Zemenova, Nada Lavrac, "Orange4WS Environment for Service-Oriented Data Mining”, The Computer Journal 2011, doi: 10.1093/comjnl/bxr077.