Clustering Multivariate Empiric Characteristic Functions for Multi-Class SVM Classification

María-Dolores Cubiles-de-la-Vega; Rafael Pino-Mejías; Esther-Lydia Silva-Ramírez

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32807

Clustering Multivariate Empiric Characteristic Functions for Multi-Class SVM Classification

Authors: María-Dolores Cubiles-de-la-Vega, Rafael Pino-Mejías, Esther-Lydia Silva-Ramírez

Abstract:

A dissimilarity measure between the empiric characteristic functions of the subsamples associated to the different classes in a multivariate data set is proposed. This measure can be efficiently computed, and it depends on all the cases of each class. It may be used to find groups of similar classes, which could be joined for further analysis, or it could be employed to perform an agglomerative hierarchical cluster analysis of the set of classes. The final tree can serve to build a family of binary classification models, offering an alternative approach to the multi-class SVM problem. We have tested this dendrogram based SVM approach with the oneagainst- one SVM approach over four publicly available data sets, three of them being microarray data. Both performances have been found equivalent, but the first solution requires a smaller number of binary SVM models.

Keywords: Cluster Analysis, Empiric Characteristic Function, Multi-class SVM, R.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1084312

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1820

References:

[1] B.E. Boser, I.M. Guyon, and V.N. Vapnik, "A training algorithm for optimal margin classifiers", in Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, ACM Press, Pittsburgh, 1992, pp. 144-152.
[2] V. Vapnik, Statistical Learning Theory, John Wiley, New York, 1998.
[3] N. Cristianini, J. Shawe-Taylor. An Introduction to Support Vector Machines, Cambridge University Press, Cambridge, 2002.
[4] I. Guyon, J. Weston, S. Barnhill, V. Vapnik . Gene selection for cancer classification using support vector machines, Machine Learning, 46(1): 389-422, 2002.
[5] L. Wang, J. Zhu, H. Zou. Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics 24(3): 412- 419, 2008.
[6] J. Zhu, S. Rosset, T. Hastie, R. Tibshirani. 1-norm support vector machines. Advances in Neural Information Processing Systems 16(1): 49-56, 2004.
[7] L. Bottou, C. Cortes, J. Denker, H. Drucker, I. Guyon, L. Jackel, Y. LeCun, U. Muller, E. Sackinger, P. Simard, and V. Vapnik. Comparison of classifier methods: A case study in handwriting digit recognition, in Proceedings of the International Conference on Pattern Recognition, 1994, pp. 77-87.
[8] A.S. Knerr, L. Personnaz, and G. Dreyfus. Single-layer learning revisited: A stepwise procedure for building and training a neural network, in Neurocomputing: Algorithms, Architectures and Applications, J. Fogelman, Ed. New York: Springer-Verlag, 1990.
[9] J. C. Platt, N. Cristianini, and J. Shawe-Taylor. Large margin DAG-s for multiclass classification, in Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 2000, vol. 12, pp.547-553.
[10] C.W. Hsu, and C.J. Lin. A comparison of Methods for Multiclass Support Vector Machines, IEEE Transactions on Neural Networks, 13(2), pp.415-425, 2002.
[11] K.Benabdeslem, and Y. Bennani. Dendrogram-based SVM for Multi- Class Classification. Journal of Computing and Information Technology, 14(4) pp. 283-286, 2006.
[12] E. Dimitriadou, K. Hornik, F. Leisch, D. Meyer and and A. Weingessel. e1071: Misc Functions of the Department of Statistics (e1071), TU Wien. R package version 1.5-18, 2008.
[13] R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org, 2012.
[14] C.C., Chang, and C.J. Lin. LIBSVM: a library for support vector machines.URL: http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.ps.gz, 2001.
[15] A. Feuerverger, R.A. Murieka. The empiric characteristic function and its application, The Annals of Statistics 5, 88-97, 1977.
[16] J. Khan, J. Wei, M. Ringner, L. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C. Atonescu, C. Peterson, P. Meltzer. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Med. 7, 673- 679, 2001.
[17] http:// research.nhgri.nih.gov/microarray/Supplement/Images/ supplemental_data.
[18] S. Deshmukh, S. Purohit. Microarray data. Statistical Analysis Using R, Alpha Science International Ltd., Oxford, 2007.
[19] F. Leisch, E. Dimitriadou. mlbench: Machine Learning Benchmark Problems. R package version 1.1-6, 2009.
[20] Material from the book's webpage, R port and packaging by Kjetil Halvorsen . ElemStatLearn: Data sets, functions and examples from the book: "The Elements of Statistical Learning, Data Mining, Inference, and Prediction" by Trevor Hastie, Robert Tibshirani and Jerome Friedman. R package version 0.1-6. 2007.