Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 31097
Signed Approach for Mining Web Content Outliers

Authors: G. Poonkuzhali, K.Thiagarajan, K.Sarukesi, G.V.Uma


The emergence of the Internet has brewed the revolution of information storage and retrieval. As most of the data in the web is unstructured, and contains a mix of text, video, audio etc, there is a need to mine information to cater to the specific needs of the users without loss of important hidden information. Thus developing user friendly and automated tools for providing relevant information quickly becomes a major challenge in web mining research. Most of the existing web mining algorithms have concentrated on finding frequent patterns while neglecting the less frequent ones that are likely to contain outlying data such as noise, irrelevant and redundant data. This paper mainly focuses on Signed approach and full word matching on the organized domain dictionary for mining web content outliers. This Signed approach gives the relevant web documents as well as outlying web documents. As the dictionary is organized based on the number of characters in a word, searching and retrieval of documents takes less time and less space.

Keywords: outliers, web content mining, Web documents, Relevant document, Signed Approach

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2049


[1] Bing Liu, Kevin Chen- Chuan Chang , Editorial: Special issue on Web Content Mining , SIGKDD Explorations, Volume 6, Issue 2.
[2] Changjun Wu, Guosun Zeng, Guorong Xu , A Web Page Segmentation Algorithm for Extracting Product Information , Information Acquisition, 2006 IEEE International Conference on Publication Date: Aug. 2006.
[3] Cheng Wang, Ying Liu, Liheng Jian, Peng Zhang, A Utility based Web Content Sensitivity Mining Approach, International Conference on Web Intelligent and Intelligent Agent Technology (WIIAT), IEEE/WIC/ACM 2008.
[4] Hongqi li, Zhuang Wu, Xiaogang Ji, Research on the techniques for Effectively Searching and Retrieving Information from Internet, International Symposium on Electronic Commerce and Security, IEEE 2008
[5] Jaroslav Pokorny, Jozef Smizansky, Page Content Rank: An approach to the Web Content Mining
[6] Jiang Yiyong, Zhang Jifu,Cai Jainghui, Zhang Sulan, Hu Lihua , The Outliers Mining Algorithm Based On Constrained Concept Lattice, Internal Symposium on Data Privacy and E.commerce , IEEE 2007.
[7] kshitija Pol, Nita Patil, Shreya Patankar, Chhaya Das, A Survey on Web Content Mining and Extraction of Structured and Semistructured data,First International Conference on Emerging trends in Engineering and Technology, 2008
[8] Malik Agyemang, Ken Barker, Rada S. Alhajj, Framework for Mining Web Content Outliers , 2004 ACM Symposiumon Applied Computing.
[9] Malik Agyemang Ken Barker Rada S. Alhajj , Mining Web Content Outliers using Structure Oriented Weighting Techniques and N-Grams , 2005 ACM Symposium on Applied Computing
[10] G.Poonkuzhali, K.Thiagarajan, K.Sarukesi,Set theoretical Approach for mining web content through outliers detection, International journal on research and industrial applications, Volume 2, Jan 2009.
[11] G.Poonkuzhali, K.Thiagarajan, K.Sarukesi, Elimination of redundant Links in web pages- Mathematical Approach, Proc. Of World Academy of Science, Engineering and Technology, Volume 40, April 2009, pp 555-562
[12] Peng Yang, Biao Huang, A modified Density Based Outliers Mining Algorithm for large Dataset, 2008 IEEE, International Seminar on Future Information technology and Management Engineering.
[13] Peng Yang, Biao Huang, Density Based Outliers Mining Algorithm with Application to Intrusion Detection, 2008 IEEE, Pacific asia workshop on computational Intelligence and Industrial Application.
[14] Ramaswamy S, Rastogi R, Shim k, Efficient Algorithm for mining outliers from large data sets, proc. Of ACM SIGMOD 2000, pp 127 - 138.
[15] Raymond Kosala, Hendrik Blockeel, Web Mining Research: A Survey, ACM SIGKDD, July 2000
[16] Ricardo Campos , Gael Dias, Celia Nunes, WISE : Hierarchical Soft Clustering of Web Page Search Results based on Web Content Mining Techniques, International conference on Web Intelligence, IEEE/WIC/ACM 2006.
[17] R.P. Grimaldi, "Discrete and Combinatorial Mathematics", Pearson Edition, New Delhi 2002.
[18] Kenneth H. Rosen, "Discrete Mathematics and its Applications", Fifth Edition, TMH, 2003.
[19] J.P. Tremblay and R. Manohar, "Discrete Mathematical Structures with Applications to Computer Science", TMH, 1997.
[20] M.K. Venkataraman, N. Sridharan and N.Chandrasekaran, "Discrete Mathematics", The National Publishing Company, 2003.
[21] J.W.Han, M.Kamber, Data Mining: Concepts and Techniques Newyork kaufmann publishers 2001.