Assessing and Visualizing the Stability of Feature Selectors: A Case Study with Spectral Data
Feature selection plays an important role in applications with high dimensional data. The assessment of the stability of feature selection/ranking algorithms becomes an important issue when the dataset is small and the aim is to gain insight into the underlying process by analyzing the most relevant features. In this work, we propose a graphical approach that enables to analyze the similarity between feature ranking techniques as well as their individual stability. Moreover, it works with whatever stability metric (Canberra distance, Spearman's rank correlation coefficient, Kuncheva's stability index,...). We illustrate this visualization technique evaluating the stability of several feature selection techniques on a spectral binary dataset. Experimental results with a neural-based classifier show that stability and ranking quality may not be linked together and both issues have to be studied jointly in order to offer answers to the domain experts.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1083751Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1175
 I. Guyon, and A. Elisseeff, "An introduction to variable and feature selection," J. Mach. Learn. Res., vol. 3, pp. 1157-1182, 2003.
[Online]. Available: http://portal.acm.org/citation.cfm?id=944968
 I. Guyon, S. Gunn, M. Nikravesh, and L. A. Zadeh, Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Comput-ing). Secaucus, NJ, USA: Springer-Verlag New York, Inc., 2006.
 A. Kalousis, J. Prados, and M. Hilario, "Stability of feature selection algorithms: a study on high-dimensional spaces," Knowledge and Infor-mation Systems, vol. 12, pp. 95-116, 2007.
 Y. Saeys, T. Abeel, and Y. Peer, "Robust Feature Selection Using Ensemble Feature Selection Techniques," in ECML PKDD '08: Proceed-ings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II. Springer-Verlag, 2008, pp. 313-325.
 G. Jurman, S. Merler, A. Barla, S. Paoli, A. Galea, and C. Furlanello, "Algebraic stability indicators for ranked lists in molecular profiling," Bioinformatics, vol. 24, no. 2, p. 258, 2008.
 L. Kuncheva, "A stability index for feature selection," in Proceedings of the 25th LASTED International Multi-Conference: artificial intelligence and applications. ACTA Press, 2007, pp. 390-395.
 K. Dunne, P. Cunningham, and F. Azuaje, "Solutions to instability prob¬lems with sequential wrapper-based approaches to feature selection," Trinity College Dublin Computer Science Technical Report, pp. 2002¬28.
 Z. He and W. Yu, "Stable feature selection for biomarker discovery," Tech. Rep. arXiv:1001.0887, Jan 2010.
 T. Cox and M. Cox, Multidimensional Scaling. Chapman and Hall, October 1994.
 I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, October 1999.
 M. Osorio, J. Zumalacrregui, R. Alaiz-Rodrguez, R. Guzman-Martnez, S. Engelsen, and J. Mateo, "Differentiation of perirenal and omental fat quality of suckling lambs according to the rearing system from fourier transforms mid-infrared spectra using partial least squares and artificial neural networks," Meat Science, vol. 83, no. 1, pp. 140 — 147, 2009.
 MATLAB, version 7.10.0 (R2010a). Natick, Massachusetts: The MathWorks Inc., 2010.