Noise Reduction in Web Data: A Learning Approach Based on Dynamic User Interests

Julius Onyancha; Valentina Plekhanova

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33122

Noise Reduction in Web Data: A Learning Approach Based on Dynamic User Interests

Authors: Julius Onyancha, Valentina Plekhanova

Abstract:

One of the significant issues facing web users is the amount of noise in web data which hinders the process of finding useful information in relation to their dynamic interests. Current research works consider noise as any data that does not form part of the main web page and propose noise web data reduction tools which mainly focus on eliminating noise in relation to the content and layout of web data. This paper argues that not all data that form part of the main web page is of a user interest and not all noise data is actually noise to a given user. Therefore, learning of noise web data allocated to the user requests ensures not only reduction of noisiness level in a web user profile, but also a decrease in the loss of useful information hence improves the quality of a web user profile. Noise Web Data Learning (NWDL) tool/algorithm capable of learning noise web data in web user profile is proposed. The proposed work considers elimination of noise data in relation to dynamic user interest. In order to validate the performance of the proposed work, an experimental design setup is presented. The results obtained are compared with the current algorithms applied in noise web data reduction process. The experimental results show that the proposed work considers the dynamic change of user interest prior to elimination of noise data. The proposed work contributes towards improving the quality of a web user profile by reducing the amount of useful information eliminated as noise.

Keywords: Web log data, web user profile, user interest, noise web data learning, machine learning.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1314911

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1737

References:

[1] J. Srivastava, R. Cooley, M. Deshpande, and P.-N. Tan, ‘Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data’, SIGKDD Explor Newsl, vol. 1, no. 2, pp. 12–23, Jan. 2000.
[2] M. Jafari, F. SoleymaniSabzchi, and S. Jamali, ‘Extracting Users’Navigational Behavior from Web Log Data: a Survey’, J. Comput. Sci. Appl. J. Comput. Sci. Appl., vol. 1, no. 3, pp. 39–45, Jan. 2013.
[3] N. Soni and P. K. Verma, ‘A Survey On Web Log Mining And Pattern Prediction’, Int. J. Adv. Technol. Eng. Sci.-2348-7550.
[4] T. R. Ramesh and C. Kavitha, ‘Web user interest prediction framework based on user behavior for dynamic websites’, Life Sci. J., vol. 10, no. 2, pp. 1736–1739, 2013.
[5] L. Yi, B. Liu, and X. Li, ‘Eliminating Noisy Information in Web Pages for Data Mining’, in Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 2003, pp. 296–305.
[6] A. Dutta, S. Paria, T. Golui, and D. K. Kole, ‘Structural analysis and regular expressions based noise elimination from web pages for web content mining’, in 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2014, pp. 1445–1451.
[7] G. D. S. Jayakumar and B. J. Thomas, ‘A new procedure of clustering based on multivariate outlier detection’, J. Data Sci., vol. 11, no. 1, pp. 69–84, 2013.
[8] V. Chitraa and A. S. Thanamani, ‘Web Log Data Analysis by Enhanced Fuzzy C Means Clustering’, Int. J. Comput. Sci. Appl., vol. 4, no. 2, pp. 81–95, Apr. 2014.
[9] L. K. Joshila Grace, V. Maheswari, and D. Nagamalai, ‘Analysis of Web Logs And Web User In Web Mining’, Int. J. Netw. Secur. Its Appl., vol. 3, no. 1, pp. 99–110, Jan. 2011.
[10] S. Gauch, M. Speretta, A. Chandramouli, and A. Micarelli, ‘User profiles for personalized information access’, in The adaptive web, Springer, 2007, pp. 54–89.
[11] P. Peñas, R. del Hoyo, J. Vea-Murguía, C. González, and S. Mayo, ‘Collective Knowledge Ontology User Profiling for Twitter – Automatic User Profiling’, in 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2013, vol. 1, pp. 439–444.
[12] S. Kanoje, S. Girase, and D. Mukhopadhyay, ‘User profiling trends, techniques and applications’, ArXiv Prepr. ArXiv150307474, 2015.
[13] H. Kim and P. K. Chan, ‘Implicit indicators for interesting web pages’, 2005.
[14] J. Xiao, Y. Zhang, X. Jia, and T. Li, ‘Measuring similarity of interests for clustering Web-users’, in Proceedings 12th Australasian Database Conference. ADC 2001, 2001, pp. 107–114.
[15] H. Liu and V. Kešelj, ‘Combined Mining of Web Server Logs and Web Contents for Classifying User Navigation Patterns and Predicting Users’ Future Requests’, Data Knowl Eng, vol. 61, no. 2, pp. 304–330, May 2007.
[16] O. Nasraoui, M. Soliman, E. Saka, A. Badia, and R. Germain, ‘A Web Usage Mining Framework for Mining Evolving User Profiles in Dynamic Web Sites’, IEEE Trans. Knowl. Data Eng., vol. 20, no. 2, pp. 202–215, Feb. 2008.
[17] T. Htwe and N. S. M. Kham, ‘Extracting data region in web page by removing noise using DOM and neural network’, in 3rd International Conference on Information and Financial Engineering, 2011.
[18] R. P. Velloso and C. F. Dorneles, ‘Automatic Web Page Segmentation and Noise Removal for Structured Extraction using Tag Path Sequences’, J. Inf. Data Manag., vol. 4, no. 3, p. 173, Sep. 2013.
[19] Y. L. Sulastri, A. B. Ek, and L. L. Hakim, ‘Developing Students’ Interest by Using Weblog Learning’, GSTF Int. J. Educ. Vol1 No2, vol. 1, no. 2, Nov. 2013.
[20] A. Nanda, R. Omanwar, and B. Deshpande, ‘Implicitly Learning a User Interest Profile for Personalization of Web Search Using Collaborative Filtering’, in 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014, vol. 2, pp. 54–62.
[21] J. Onyancha, V. Plekhanova, and D. Nelson, ‘Noise Web Data Learning from a Web User Profile: Position Paper’, in Proceedings of the World Congress on Engineering, 2017, vol. 2.
[22] N. Narwal, ‘Improving web data extraction by noise removal’, in Fifth International Conference on Advances in Recent Technologies in Communication and Computing (ARTCom 2013), 2013, pp. 388–395.
[23] A. Garg and B. Kaur, ‘Enhancing Performance of Web Page by Removing Noises using LRU’, Int. J. Comput. Appl., vol. 103, no. 6, 2014.
[24] H. K. Azad, R. Raj, R. Kumar, H. Ranjan, K. Abhishek, and M. P. Singh, ‘Removal of Noisy Information in Web Pages’, 2014, pp. 1–5.
[25] A. K. Santra and S. Jayasudha, ‘Classification of web log data to identify interested users using Naïve Bayesian classification’, Int. J. Comput. Sci. Issues, vol. 9, no. 1, pp. 381–387, 2012.
[26] S. P. Malarvizhi and B. Sathiyabhama, ‘Frequent Pagesets from Web Log by Enhanced Weighted Association Rule Mining’, Clust. Comput., vol. 19, no. 1, pp. 269–277, Mar. 2016.
[27] C. Ramya, G. Kavitha, and D. K. Shreedhara, ‘Preprocessing: A Prerequisite for Discovering Patterns in Web Usage Mining Process’, ArXiv Prepr. ArXiv11050350, 2011.
[28] J. Chand, A. S. Chauhan, and A. K. Shrivastava, ‘Review on Classification of Web Log Data using CART Algorithm’, Int. J. Comput. Appl., vol. 80, no. 17, pp. 41–43, Oct. 2013.
[29] W. Zhu, K. Niu, G. Hu, and J. Xia, ‘Predict user interest with respect to global interest popularity’, in Fuzzy Systems and Knowledge Discovery (FSKD), 2014 11th International Conference on, 2014, pp. 898–902.
[30] A. Ahmed, Y. Low, M. Aly, V. Josifovski, and A. J. Smola, ‘Scalable Distributed Inference of Dynamic User Interests for Behavioral Targeting’, in Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 2011, pp. 114–122.
[31] X. Wei, Y. Wang, Z. Li, T. Zou, and G. Yang, ‘Mining Users Interest Navigation Patterns Using Improved Ant Colony Optimization’, Intell. Autom. Soft Comput., vol. 21, no. 3, pp. 445–454, Jul. 2015.
[32] M. Azimpour-Kivi and R. Azmi, ‘A webpage similarity measure for web sessions clustering using sequence alignment’, in 2011 International Symposium on Artificial Intelligence and Signal Processing (AISP), 2011, pp. 20–24.
[33] G. Sreedhar, Design Solutions for Improving Website Quality and Effectiveness. IGI Global, 2016.
[34] B. J. Jansen, D. L. Booth, and A. Spink, ‘Patterns of query reformulation during Web searching’, J. Am. Soc. Inf. Sci. Technol., vol. 60, no. 7, pp. 1358–1371, Jul. 2009.
[35] X. Qi and B. D. Davison, ‘Web page classification: Features and algorithms’, ACM Comput. Surv., vol. 41, no. 2, pp. 1–31, Feb. 2009.