Classification Influence Index and its Application for k-Nearest Neighbor Classifier

Sejong Oh

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33093

Classification Influence Index and its Application for k-Nearest Neighbor Classifier

Authors: Sejong Oh

Abstract:

Classification is an important topic in machine learning and bioinformatics. Many datasets have been introduced for classification tasks. A dataset contains multiple features, and the quality of features influences the classification accuracy of the dataset. The power of classification for each feature differs. In this study, we suggest the Classification Influence Index (CII) as an indicator of classification power for each feature. CII enables evaluation of the features in a dataset and improved classification accuracy by transformation of the dataset. By conducting experiments using CII and the k-nearest neighbor classifier to analyze real datasets, we confirmed that the proposed index provided meaningful improvement of the classification accuracy.

Keywords: accuracy, classification, dataset, data preprocessing

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1084320

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1494

References:

[1] Y. Saeys, I. Inza, and P. Larranaga, A review of feature selection techniques in bioinformatics, Bioinformatics, 23, 2007, pp.2507-2517.
[2] G. Bhanot, G. Alexe, and B. Venkataraghavan, A robust meta classification strategy for cancer detection from MS data, Proteomics, 6, 2006, pp.592-604.
[3] P. Jafari, and F. Azuaje, An assessment of recently published gene expression data analyses: reporting experimental design and statistical factors, BMC Med Inform Decis Mak, 6, 2006, p. 27.
[4] H. W. Ressom, R. S. Varghese, and S. K. Drake, Peak selection from MALDI-TOF mass spectra using ant colony optimization, Bioinformatics, 23, 2007, pp. 619-626.
[5] Y. Sun, and D. Wu, A RELIEF Based Feature Extraction Algorithm, in Proceedings of the 2008 SIAM International Conference on Data Mining, 2008, pp.188-195.
[6] X. Cui, H. Zhao, and J. Wilson, Optimized Ranking and Selection Methods for Feature Selection with Application in Microarray Experiments, J Biopha Stat, 20, 2010, pp.223-239.
[7] S. Xu, Q. Luo, and H. Li, Time Series Classification Based on Attributes Weighted Sample Reducing KNN, Proceedings of the 2009 Second International Symposium on Electronic Commerce and Security, 2009, pp.194-199.
[8] Y. Liao, and X. Pan, A New Method of Training Sample Selection in Text Classification, 2010 Second International Workshop on Education Technology and Computer Science, 2010, pp.211-214.
[9] Y. Xu, L. Zhen, and L. Yang, Classification Algorithm Based on Feature Selection and Samples Selection, Lecture Notes in Computer Science, 5552, 2009, pp.631-638.
[10] B. T. McBride, and G. L. Peterson, Blind Data Classification using Hyper-Dimensional Convex Polytopes, Proceedings of 17th International FLAIRS conference, 2004, pp.1-6.
[11] J. Schuchhardt, D. Beule, and A. Malik, Normalization strategies for cDNA microarrays, Nucleic Acids Research, 28, 2000, E47-e47.
[12] W. Wu, E. P. Xing, and Connie Myers, Evaluation of normalization methods for cDNA micro-array data by k-NN classification, BMC Bioinformatics, 6, 2005, p.191.
[13] G. Collewet, M. Strzelecki, and F. Mariette, Influence of MRI acquisition protocols and image intensity normalization methods on texture classification, Magnetic Resonance Imaging, 22, 2004, pp.81-91.
[14] S. Oh, A New Feature Evaluation Method Based on Category Overlap, Computers in Biology and Medicine, 41, 2011, pp.115-122.
[15] J. Liang, S. Yang, A. Winstanley, Invariant optimal feature selection: A distance discriminant and feature ranking based solution, Pattern Recogn., 41, 2008, pp.1429-1439.
[16] C. Ding, H. Peng, Minimum Redundancy Feature Selection from Microarray Gene Expression Data, Proceedings of the IEEE Computer Society Conference on Bioinformatics, 2003, p.523.