{"title":"Hybrid Machine Learning Approach for Text Categorization","authors":"Nerijus Remeikis, Ignas Skucas, Vida Melninkaite","volume":5,"journal":"International Journal of Computer and Information Engineering","pagesStart":1539,"pagesEnd":1544,"ISSN":"1307-6892","URL":"https:\/\/publications.waset.org\/pdf\/9621","abstract":"
Text categorization - the assignment of natural language documents to one or more predefined categories based on their semantic content - is an important component in many information organization and management tasks. Performance of neural networks learning is known to be sensitive to the initial weights and architecture. This paper discusses the use multilayer neural network initialization with decision tree classifier for improving text categorization accuracy. An adaptation of the algorithm is proposed in which a decision tree from root node until a final leave is used for initialization of multilayer neural network. The experimental evaluation demonstrates this approach provides better classification accuracy with Reuters-21578 corpus, one of the standard benchmarks for text categorization tasks. We present results comparing the accuracy of this approach with multilayer neural network initialized with traditional random method and decision tree classifiers.<\/p>\r\n","references":"[1] Banerji, A. (1997). Initializing neural networks using decision trees.\r\nComputational Learning Theory and Natural Learning Systems, MIT\r\nPress, IV, 3-15.\r\n[2] Frakes, W., and R. Baeza-Yates (1992). Information Retrieval: Data\r\nStructures & Algorithms, Prentice Hall.\r\n[3] Haykin, S. (1994). Neural Networks: A comprehensive foundation.\r\nMacmillan College Publishing Comp., New York.\r\n[4] Yang, Y., and J. Pedersen (1997). A comparative study on feature\r\nselection in text categorization. In Proceedings of ICML-97, 14th\r\nInternational Conference on Machine Learning, Nashville, US, 412-420.\r\n[5] Yang,Y., and X. Liu (1999). A re-examination of text categorization\r\nmethods. In Proceedings of SIGIR-99, 22nd ACM International\r\nConference on Research and Development in Information Retrieval,\r\nBerkeley, US, 42-49.\r\n[6] Lewis, D.D., and M. Ringuette (1994). A comparison of two learning\r\nalgorithms for text categorization. In Proceedings of SDAIR-94, 3rd\r\nAnnual Symposium on Document Analysis and Information Retrieval,\r\nLas Vegas, 81-93.\r\n[7] Quinlan, J.R. (1993). C4-5: Programs for machine learning, Morgan\r\nKaufmann, San Mateo, CA.\r\n[8] Rumelhart, D.E., and J.L. Mcclelland (1986). Parallel distributed\r\nprocessing 1. MIT Press, Cambridge, MA.\r\n[9] Sebastiani, F. (2002). Machine learning in automated text categorization.\r\nACM Computing Surveys, 34(1), 1-47.\r\n[10] van Rijsbergen, C.J. (1979). Information Retrieval. Butterworths,\r\nLondon.\r\n[11] Wiener, E.D., J.O. Pedersen, and A. S. Weigend (1995). A neural\r\nnetwork approach to topic spotting. In Proceedings of SDAIR-95, 4th\r\nAnnual Symposium on Document Analysis and Information Retrieval,\r\nLas Vegas, 317-332.\r\n[12] Joachims, T. (1998). Text categorization with support vector machines:\r\nlearning with many relevant features. In Proceedings of ECML-98,10th\r\nEuropean Conference on Machine Learning, 137-142.\r\n[13] Dumais, S. T. (1991). Improving the retrieval information from external\r\nsoures. Behaviour Research Methods, Instruments and Computers,\r\n23(2), 229-236.","publisher":"World Academy of Science, Engineering and Technology","index":"Open Science Index 5, 2007"}