Feature Selection Methods for an Improved SVM Classifier
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 33122
Feature Selection Methods for an Improved SVM Classifier

Authors: Daniel Morariu, Lucian N. Vintan, Volker Tresp

Abstract:

Text categorization is the problem of classifying text documents into a set of predefined classes. After a preprocessing step, the documents are typically represented as large sparse vectors. When training classifiers on large collections of documents, both the time and memory restrictions can be quite prohibitive. This justifies the application of feature selection methods to reduce the dimensionality of the document-representation vector. In this paper, three feature selection methods are evaluated: Random Selection, Information Gain (IG) and Support Vector Machine feature selection (called SVM_FS). We show that the best results were obtained with SVM_FS method for a relatively small dimension of the feature vector. Also we present a novel method to better correlate SVM kernel-s parameters (Polynomial or Gaussian kernel).

Keywords: Feature Selection, Learning with Kernels, SupportVector Machine, and Classification.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1332490

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1832

References:


[1] S. Chakrabarti, "Mining the Web- Discovering Knowledge from hypertext data", Morgan Kaufmann Press, 2003.
[2] G. Forman, "A Pitfall and Solution in Multi-Class Feature Selection for Text Classification", Proceedings of the 21st International Conference on Machine Learning, Banff, Canada, 2004.
[3] T. Jebara, "Multi Task Feature and Kernel Selection for SVMs", Proceedings of the 21st International Conference on Machine Learning, Banff, Canada, 2004.
[4] T. Mitchell, "Machine Learning", McGraw Hill Publishers, 1997.
[5] D. Mladenic, J. Brank, M. Grobelnik and N. Milic-Frayling, "Feature Selection Using Support Vector Machines", The 27th Annual International ACM SIGIR Conference (SIGIR2004), pp 234-241, 2004.
[6] D. Morariu, "Classification and Clustering using Support Vector Machine", 2nd PhD Report, University ÔÇ×Lucian Blaga" of Sibiu, September, 2005, http://webspace.ulbsibiu.ro/ daniel.morariu/html/Docs /Report2.pdf.
[7] D. Morariu, L. Vintan, "A Better Correlation of the SVM kernel-s Parameters", Proceeding of The 5th RoEduNet International Conference, Sibiu, June 2006.
[8] C. Nello, J. Swawe-Taylor, "An introduction to Support Vector Machines", Cambridge University Press, 2000.
[9] J. Platt, "Fast training of support vector machines using sequential minimal optimization". In B. Scholkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods - Support Vector Learning, pages 185-208, Cambridge, MA, 1999, MIT Press.
[10] Reuters Corpus: http://about.reuters.com/researchandstandards/corpus/. Released in November 2000.
[11] B. Schoelkopf, A. Smola, "Learning with Kernels, Support Vector Machines", MIT Press, London, 2002.