A Comparative Study of Web-pages Classification Methods using Fuzzy Operators Applied to Arabic Web-pages
Authors: Ahmad T. Al-Taani, Noor Aldeen K. Al-Awad
Abstract:
In this study, a fuzzy similarity approach for Arabic web pages classification is presented. The approach uses a fuzzy term-category relation by manipulating membership degree for the training data and the degree value for a test web page. Six measures are used and compared in this study. These measures include: Einstein, Algebraic, Hamacher, MinMax, Special case fuzzy and Bounded Difference approaches. These measures are applied and compared using 50 different Arabic web-pages. Einstein measure was gave best performance among the other measures. An analysis of these measures and concluding remarks are drawn in this study.
Keywords: Text classification, HTML, web pages, machine learning, fuzzy logic, Arabic web pages.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1078967
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2234References:
[1] Dou Shen, Zheng Chen, Qiang Yang, Hua-Jun Zeng, Benyu Zhang, Yuchang Lu, Wei-Ying Ma. Web-page Classification through Summarization. Proc. of the ACM SIGIR 04, , July 25-29, 2004. Sheffield, South York Shire, UK.
[2] H. Chen and S. T. Dumais. Bringing order to the Web: Automatically categorizing search results. Proc. of CHI2000, 2000, 145-152.
[3] D. Michie, D.J. Spiegelhalter, C.C. Taylor. February 17, 1994. Machine Learning, Neural and Statistical Classification, Institute of Public Health, University Forvie Site, Robinson Way, Cambridge, U.K.
[4] Dwi H. Widyantoro and John Yen, Department of Computer Science Texas A&M University, 1999. A Fuzzy Similarity Approach in Text Classification Task, Texas, USA.
[5] Hui Yang, Tat-Seng Chua. Effectiveness of Web Page Classification on Finding List Answers. Proc. of the ACM SIGIR 04, July 25-29, 2004, Sheffield, South York Shire, UK.
[6] Stephanie W. Haas, and Erika S. Grams. Page and Link Classifications: Connecting Diverse Resources. Proc. of the ACM, 1998. 99-107. Digital Libraries 98 Pittsburgh PA USA.
[7] Michelangelo Ceci and Donato Malerba. Hierarchical Classification of HTML Documents with WebClassII. F. Sebastiani (Ed.): ECIR 2003, LNCS 2633, pp. 57-72, 2003.
[8] Rongbo Du, Reihaneh Safavi-Naini and Willy. Web Filtering Using Text Classification, 2002, supported by Smart Internet Technology Cooperative Research Centre, Australia.
[9] Lawrence Kai Shih and David R. Karger. Using URLs and Table Layout for Web Classification Tasks. WWW2004, May 17-22, 2004, pages 193-202, supported by ACM, New York, USA.
[10] Eric J. Glover1, Kostas Tsioutsiouliklis, Steve Lawrence, David M. Pennock , Gary W. Flake. Using Web Structure for Classifying and Describing Web Pages. WWW2002, May 7-11, 2002, , pages 562-569, supported by ACM, Honolulu, Hawaii, USA.
[11] Gongde gue, Hue Wang, David bell, Yaxin bi, and Kairan Greer. A KNN model-based approach and its application in text categorization, 2002, supported by European Commission project ICONS, project no. IST-2001-32429.
[12] Anders Ardö, DTV, Lyngby, Denmark Traugott Koch, NetLab, Lund, Sweden. Automatic classification applied to the full-text Internet documents in a robot-generated subject index, 1999. Manuscript of a forthcoming publication in proceedings of the Online Information 99 Conference, London.
[13] Aijun An, Yanhui Huang, Xiangji Huang, and Nick Cercone. Feature Selection with Rough Sets for Web Page Classification, 2002. Supported by natural Sciences and Engineering Research Council (NSERC) of Ontario, Canada and the Institute for Robotics and Intelligent Systems (IRIS).
[14] Hans Roubos, Magne Setnes, and Janos Abonyi, 2000. Learning Fuzzy Classification Rules from Data.
[15] Heiner Stuckenschmidt, Jens Hartmann and Frank van Harmelen, 2002, American Association for Artificial Intelligence (www.aaai.org). Learning Structural Classification Rules for Web page Categorization. Bremen, Germany.
[16] Sarah Zelikovitz, Haym Hirsh, 1999. Improving Short-Text Classification Using Unlabeled Background Knowledge to Assess Document Similarity. Computer Science Department, Rutgers University, USA.
[17] W┼éodzis┼éaw Duch. Similarity-based methods: a general framework for classification, approximation and association, Control and Cybernetics vol.29 (2000), Grudzia┬©dzka, Toru'n, Poland.