Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30172
A Proposed Hybrid Approach for Feature Selection in Text Document Categorization

Authors: M. F. Zaiyadi, B. Baharudin

Abstract:

Text document categorization involves large amount of data or features. The high dimensionality of features is a troublesome and can affect the performance of the classification. Therefore, feature selection is strongly considered as one of the crucial part in text document categorization. Selecting the best features to represent documents can reduce the dimensionality of feature space hence increase the performance. There were many approaches has been implemented by various researchers to overcome this problem. This paper proposed a novel hybrid approach for feature selection in text document categorization based on Ant Colony Optimization (ACO) and Information Gain (IG). We also presented state-of-the-art algorithms by several other researchers.

Keywords: Ant colony optimization, feature selection, information gain, text categorization, text representation.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1084234

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1688

References:


[1] F. Sebastiani, "Machine learning automated text categorization", ACM Computing Surveys, vol. 34, no. 1, pp. 1 - 47, March 2002.
[2] A. Tasci and T. Gungor, "An evaluation of existing and new feature selection metrics in text categorization", International Symposium on Computer and Information Science, pp. 1-6, Oct. 2008.
[3] Y. Yang and J. O. Pedersen, "A Comparative study on feature selection in text categorization", Proceeding of 14th International Conference on Machine Learning, San Francisco, 1997, pp. 412-420.
[4] E. Gabrilovich and S. Markovitch, "Text Categorization with many redundant features: using aggressive feature selection to make SVM competetive with C4.5", Proceeding of 21st International Conference on Machine Learning, Canada, 2004.
[5] Sheen and Rajesh, "Network intrusion detection using feature selection and decision tree classifier", IEEE Region 10 Conference, Hyderabad, pp. 1-4, Nov. 2008.
[6] Q. Li, J.H. Li, G.S. Li, and S.H. Li, "A rough set-based hybrid feature selection method for topic-specific text filtering", Proceedings of the Third International Conf. on Machine Learning and Cybernetics, Shanghai, August 2004, pp. 1464-1468.
[7] S. Wang, Y. Wei, and D. Li, "A hybrid method of feature selection for Chinese text sentiment classification", Fourth International Conf. on Fuzzy Systems and Knowledge Discovery, 2007.
[8] C.S. Yang, L.Y. Chuang, J.C. Li, and C.H. Yang, "Information gain with chaotic genetic algorithm for gene selection and classification problem", IEEE International Conference on Systems, Man and Cybernetics, pp. 1128-1133, Oct. 2008.
[9] M., Dorigo and T. Stutzle, Ant Colony Optimization, MIT press, 2004, pp.25-26.
[10] H.R. Kanan, K. Faez and M. Hosseinzadeh, "Face recognition system using ant colony optimization-based selected features", IEEE Symposium on Computational Intelligence in Security and Defense Applications, pp. 57-62, Apr. 2007.
[11] C.K. Zhang and H. Hu, "Feature selection using the hybrid of ant colony optimization and mutual information for the forecaster", Proceedings of the Fourth International Conf. on Machine Learning and Cybernetic, Guangzhou, August 2005, pp. 1728-1732.
[12] J. Zhou, R. Ng, and X. Li, "Ant colony optimization and mutual information hybrid algorithms for feature subset selection in equipment fault diagnosis", 10th International Conf. on Control, Automation, Robotics and Vision, Hanoi, Vietnam, December 2008.
[13] M. He, "Feature selection based on ant colony optimization and rough set theory", International Symposium on Computer Science and Computational Technology, pp. 247-250. Dec. 2008.
[14] M.E. Basiri and S. Nemati, "A novel hybrid ACO-GA algorithm for text feature selection", IEEE Congress on Evolutionary Computation, pp. 2561-2568, 2009.
[15] E. Elbeltagi, T. Hegazy and D. Grierson, "Comparison among five evolutionary-based optimization algorithms", Advanced Engineering Informatics, vol. 19, no. 1, pp. 43-53, 2005.
[16] M. Dorigo and C. Blum, "Ant colony optimization theory: A survey", Theoretical Computer Science, pp. 243-278, 2005.
[17] M.H. Aghdam, N.G. Aghaee and M.E. Basiri, "Application of ant colony optimization for feature selection in text categorization", IEEE Congress on Evolutionary Computation, pp. 2867-2873, June 2008.
[18] C. Lee and G.G. Lee, "MMR-based feature selection for text categorization", Proceedings of the Annual Conf. of Human Language Technology conference / North American chapter of the Association for Computational Linguistic, May 2004.
[19] R. Jensen, "Combining rough and fuzzy sets for feature selection", Ph.D. Dissertation, School of Information, Edinburgh Univ., 2005.
[20] Z. Pawlak, Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishing, Dordrecht, 1991.
[21] A.M. Mesleh and G. Kanaan, "Support vector machine text classification system: Using ant colony optimization based feature subset selection", Int. Conf. on Computer Engineering and Systems, pp. 143-148, Nov. 2008.
[22] M. Sadeghzadeh and M. Teshnehlab, "Correlation based feature selection using ant colony optimization", World Academy of Science, Engineering and Technology 64, pp. 497-502, 2010.
[23] A. Al-Ani, "Ant colony optimization for feature subset selection", World Academy of Science, Engineering and Technology 4, pp. 35-38, 2005.
[24] M. Deriche, "Feature selection using ant colony optimization", International Multi-Conference on Systems, Signals and Devices, pp. 1- 4, March 2009.
[25] L. Wen, Q. Yin, and P. Guo, "Ant colony optimization algorithm for feature selection and classification of multispectral remote sensing image", IEEE Int. Geosciences and Remote Sensing Symposium, pp. 923-926, July 2008.
[26] W. Xiong and C. Wang, "A hybrid improved and colony optimization and random forest feature selection method for microarray data", Fifth International Joint Conference on INC, IMS and IDC, pp. 559-563, 2009.