Evaluating some Feature Selection Methods for an Improved SVM Classifier
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 33093
Evaluating some Feature Selection Methods for an Improved SVM Classifier

Authors: Daniel Morariu, Lucian N. Vintan, Volker Tresp

Abstract:

Text categorization is the problem of classifying text documents into a set of predefined classes. After a preprocessing step the documents are typically represented as large sparse vectors. When training classifiers on large collections of documents, both the time and memory restrictions can be quite prohibitive. This justifies the application of features selection methods to reduce the dimensionality of the document-representation vector. Four feature selection methods are evaluated: Random Selection, Information Gain (IG), Support Vector Machine (called SVM_FS) and Genetic Algorithm with SVM (GA_FS). We showed that the best results were obtained with SVM_FS and GA_FS methods for a relatively small dimension of the features vector comparative with the IG method that involves longer vectors, for quite similar classification accuracies. Also we present a novel method to better correlate SVM kernel-s parameters (Polynomial or Gaussian kernel).

Keywords: Features selection, learning with kernels, support vector machine, genetic algorithms and classification.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1060952

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1537

References:


[1] Bhatia S., Selection of Search Terms Based on User Profile, ACM Explorations, pages 224-233, 1998.
[2] Chakrabarti S.: Mining the Web- Discovering Knowledge from hypertext data, Morgan Kaufmann Press, 2003.
[3] Chih-Wei Hsu, Chih-Chang Chang and Chih-Jen Lin, A Practical Guide to Support Vector Classification, Department of Computer Science and Information Engineering National Taiwan University, 2003 (Available http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide).
[4] Douglas H., Ioannis T., Constantin A.: A Theoretical Characterization of Linear SVM-Based Feature Selection, Proceedings of the 21st International Conference on Machine Learning, Banff, Canada, 2004.
[5] Forman G.: A Pitfall and Solution in Multi-Class Feature Selection for Text Classification, Proceedings of the 21st International Conference on Machine Learning, Banff, Canada, 2004.
[6] Jebara T. and Jaakkola T.: Feature selection and dualities in maximum entropy discrimination, Uncertainty in Artificial Intelligence 16, 2000.
[7] Jebara T.: Multi Task Feature and Kernel Selection for SVMs, Proceedings of the 21st International Conference on Machine Learning, Banff, Canada, 2004.
[8] http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[9] Mitchell T.: Machine Learning, McGraw Hill Publishers, 1997.
[10] Mladenic D., Feature Subset Selection in Text Learning, Proceedings of the 10th European Conference on Machine Leraning (ECML-98), pages 95-100, 1998.
[11] Mladenic D., Grobelnik M.: Feature selection for unbalanced class distribution and naïve bayes, In Proceedings of the 16th International Conference on Machine Learning ICML, p.258-267, 1999.
[12] Mladenic D., Brank J., Grobelnik M., Milic-Frayling N.: Feature Selection Using Support Vector Machines The 27th Annual International ACM SIGIR Conference (SIGIR2004), pp 234-241, 2004.
[13] Morariu D., Classification and Clustering using Support Vector Machine, 2nd PhD Report, University ÔÇ×Lucian Blaga" of Sibiu, September, 2005, http://webspace.ulbsibiu.ro/ daniel.morariu/html/Docs/ Report2.pdf.
[14] Morariu D., Relevant characteristics extraction from semantically unstructured data, 3rd PhD Report, University "Lucian Blaga" of Sibiu, September, 2006, http://webspace.ulbsibiu.ro/ daniel.morariu/html/ Docs/Report3.pdf.
[15] Morariu D., Vintan L.: A Better Correlation of the SVM kernel-s Parameters, Proceeding of the 5th RoEduNet International Conference, Sibiu, June 2006.
[16] Morariu D., Vintan L. Tresp V.: Feature selection methods for an Improved SVM Classifier, Proceeding of the 3rd International Conference on Intelligent Systems, ICIS06, Prague, August, 2006.
[17] Morariu D., Vintan L. Tresp V.: Evolutionary Feature Selection for Text Documents using the SVM, Proceeding of the 3rd International Conference on Neural Networks and Pattern Recognition, NNPR06, Barcelona, October, 2006.
[18] Nello C., John Swawe-Taylor: An introduction to Support Vector Machines, Cambridge University Press, 2000
[19] Platt J.: Fast training of support vector machines using sequential minimal optimization. In B. Scholkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods - Support Vector Learning, pages 185-208, Cambridge, MA, 1999, MIT Press.
[20] Reuters Corpus, Volume 1, English Language, 1996-08-20 to 1997-08- 19. Available through http://about.reuters.com/researchandstandards /corpus/. Released in November 2000.
[21] Schoslkopf B., Smola A.: Learning with Kernels, Support Vector Machines, MIT Press, London, 2002.
[22] Smith, J.E., Eiben, A.E., Introduction to evolutionary computing, Springer-Verlag, 2003
[23] Vapnik V.: The nature of Statistical learning Theory. Springer, New York, 1995.
[24] Yang Y., J.O. Pedersan,: A Comparative Study on Feature Selection in Text Categorization, Proceedings of ICML, 14th International Conference of Machine Learning, pages 412-420, 1997.
[25] Whitely, D., A genetic Algorithm Tutorial, Foundation of Genetic Algorithms, ed. Morgan Kaufmann