Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30172
An Empirical Analysis of Arabic WebPages Classification using Fuzzy Operators

Authors: Ahmad T. Al-Taani, Noor Aldeen K. Al-Awad

Abstract:

In this study, a fuzzy similarity approach for Arabic web pages classification is presented. The approach uses a fuzzy term-category relation by manipulating membership degree for the training data and the degree value for a test web page. Six measures are used and compared in this study. These measures include: Einstein, Algebraic, Hamacher, MinMax, Special case fuzzy and Bounded Difference approaches. These measures are applied and compared using 50 different Arabic web pages. Einstein measure was gave best performance among the other measures. An analysis of these measures and concluding remarks are drawn in this study.

Keywords: Text classification, HTML documents, Web pages, Machine learning, Fuzzy logic, Arabic Web pages.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1084356

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1585

References:


[1] Dou Shen, Zheng Chen, Qiang Yang, Hua-Jun Zeng, Benyu Zhang, Yuchang Lu, Wei-Ying Ma, "Web-page Classification through Summarization", Proceedings of the ACM SIGIR 04, July 25-29, 2004, Sheffield, South York Shire, UK.
[2] H. Chen, S. T. Dumais, "Bringing order to the Web: Automatically categorizing search results", Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI-00), ACM pp. 145-152, 2000.
[3] Michie, D., Spiegelhalter, D.J., Taylor, C.C., Machine Learning, Neural and Statistical Classification, Ellis Horwood, London, 1994.
[4] D. H. Widyantoro, J. Yen, "A Fuzzy Similarity Approach in Text Classification Task", Proceedings of Ninth IEEE Int. Conf. on Fuzzy Systems (FUZZ-IEEE 2000), pp. 653-658, San Antonio, Texas, May 2000.
[5] Ahmad T. A-Taani, Noor Aldeen K Al-Awad, "A Comparative Study of Web-pages Classification Methods Using Fuzzy Operators Applied to Arabic Web-pages",PWASET, vol. 7, pp. 33-35, 2005.
[6] Hui Yang, Tat-Seng Chua, "Effectiveness of Web Page Classification on Finding List Answers", Proceedings of the ACM SIGIR 04, July 25- 29, 2004, Sheffield, South York Shire, UK.
[7] Stephanie W. Haas, Erika S. Grams, "Page and Link Classifications: Connecting Diverse Resources", Proceedings of the ACM, pp. 99-107, Digital Libraries 1998, Pittsburgh PA USA.
[8] Michelangelo Ceci, Donato Malerba, "Hierarchical Classification of HTML Documents with WebClassII", In: F. Sebastiani (Ed.): ECIR 2003, LNCS 2633, pp. 57-72, 2003.
[9] Rongbo Du, Rei Safavi-Naini and Willy Susilo, "Web Filtering Using Text Classification", Proceedings of the 11th IEEE International Conference on Network (ICON 2003), pp. 325-330, 2003.
[10] Lawrence Kai Shih, David R. Karger, "Using URLs and Table Layout for Web Classification Tasks", Proceedings of the WWW2004, May 17- 22, 2004, pp. 193-202, New York, USA.
[11] Eric J. Glover1, Kostas Tsioutsiouliklis, Steve Lawrence, David M. Pennock, Gary W. Flake, "Using Web Structure for Classifying and Describing Web Pages", Proceedings of the WWW2002, May 7-11, 2002, pp. 562-569, Honolulu, Hawaii, USA.
[12] Gongde Guo, Hui Wang, David A. Bell, Yaxin Bi, Kieran Greer, "An kNN Model-Based Approach and Its Application in Text Categorization, Proceedings of the 5th International Conference (CICLing 2004), Seoul, Korea, February 15-21, 2004, pp. 559-570.
[13] Anders Ardö, DTV, Lyngby, Denmark Traugott Koch, NetLab, Lund, Sweden, "Automatic classification applied to the full-text Internet documents in a robot-generated subject index", Proceedings of the 23rd International Online Information Meeting, London, 7-9 Dec 1999, pp. 239-246.
[14] Aijun An, Yanhui Huang, Xiangji Huang, Nick Cercone, "Feature Selection with Rough Sets for Web Page Classification", In: Transactions on Rough Sets II: Rough Sets and Fuzzy Sets, James F. Peters, Andrzej Skowron, Didier Dubois, Jerzy W. Grzymala-Busse, Masahiro Inuiguchi, Lech Polkowski (Editors), 2004.
[15] J. A. Roubos, M. Setnes, J. Abonyi, "Learning fuzzy classification rules from data", In: Developments in Soft Computing, John R, Birkenhead R., (Editors), Springer-Verlag, Berlin/Heidelberg, pp.108-115, 2001.
[16] Heiner Stuckenschmidt, Jens Hartmann,Frank van Harmelen, "Learning Structural Classification Rules for Web page Categorization", Proceedings of FLAIRS 2002, special track on Semantic Web, S. Haller , G. Simmons (Editors)..
[17] Sarah Zelikovitz, Haym Hirsh, "Improving Short-Text Classification using Unlabeled Background Knowledge to Assess Document Similarity", Proceedings of the Seventeenth International Conference on Machine Learning (ICML-2000), Morgan Kaufmann Publishers.
[18] Wlodzislaw Duch, "Similarity-based methods: a general framework for classification, approximation and association", Control and Cybernetics, vol. 29 (2000), Grudzia┬©dzka, Toru'n, Poland.