On the Efficient Implementation of a Serial and Parallel Decomposition Algorithm for Fast Support Vector Machine Training Including a Multi-Parameter Kernel
Authors: Tatjana Eitrich, Bruno Lang
Abstract:
This work deals with aspects of support vector machine learning for large-scale data mining tasks. Based on a decomposition algorithm for support vector machine training that can be run in serial as well as shared memory parallel mode we introduce a transformation of the training data that allows for the usage of an expensive generalized kernel without additional costs. We present experiments for the Gaussian kernel, but usage of other kernel functions is possible, too. In order to further speed up the decomposition algorithm we analyze the critical problem of working set selection for large training data sets. In addition, we analyze the influence of the working set sizes onto the scalability of the parallel decomposition scheme. Our tests and conclusions led to several modifications of the algorithm and the improvement of overall support vector machine learning performance. Our method allows for using extensive parameter search methods to optimize classification accuracy.
Keywords: Support Vector Machine Training, Multi-ParameterKernels, Shared Memory Parallel Computing, Large Data
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1077135
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1446References:
[1] A. Kless and T. Eitrich, "Cytochrome p450 classification of drugs with support vector machines implementing the nearest point algorithm," in Knowledge Exploration in Life Science Informatics, International Symposium, KELSI 2004, Milan, Italy, ser. Lecture Notes in Computer Science, J. A. L'opez, E. Benfenati, and W. Dubitzky, Eds., vol. 3303. Springer, 2004, pp. 191-205.
[2] T. Joachims, "Text categorization with support vector machines: learning with many relevant features," in Proceedings of ECML-98, 10th European Conference on Machine Learning. Chemnitz, Germany: Springer, 1998, pp. 137-142.
[3] B. Sch¨olkopf, "The kernel trick for distances," in NIPS, 2000, pp. 301- 307.
[4] E. Osuna, R. Freund, and F. Girosi, "Support vector machines: training and applications," Massachusetts Institute of Technology, AI Memo 1602, 1997.
[5] V. N. Vapnik, Statistical learning theory. New York: John Wiley & Sons, 1998.
[6] C.-J. Lin, "On the convergence of the decomposition method for support vector machines," IEEE Transactions on Neural Networks, vol. 12, no. 6, pp. 1288-1298, 2001.
[7] M. Momma and K. P. Bennett, "A pattern search method for model selection of support vector regression," in Proceedings of the Second SIAM International Conference on Data Mining, Arlington, VA, USA, April 11-13, 2002, R. L. Grossman, J. Han, V. Kumar, H. Mannila, and R. Motwani, Eds. SIAM, 2002.
[8] C. Staelin, "Parameter selection for support vector machines," HP Laboratories, Israel, Technical Report HPL-2002-354, 2002.
[9] H. Nunez, C. Angulo, and A. Catala, "Support vector machines with symbolic interpretation," VII Simpsio Brasileiro de Redes Neurais - Recife, PE, Brasil, pp. 142-147, 2002. (Online). Available: http://doi.ieeecomputersociety.org/10.1109/SBRN.2002.1181456
[10] H. P. Graf, E. Cosatto, L. Bottou, I. Dourdanovic, and V. Vapnik, "Parallel support vector machines: the cascade SVM," in Advances in Neural Information Processing Systems 17, L. K. Saul, Y. Weiss, and L. Bottou, Eds. Cambridge, MA: MIT Press, 2005, pp. 521-528.
[11] V. Ruggiero and L. Zanni, "On the efficiency of splitting and projection methods for large strictly convex quadratic programs," Applied Optimization, pp. 401-413, 1999.
[12] H. Yu, J. Yang, and J. Han, "Classifying large data sets using SVMs with hierarchical clusters," in Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 24 - 27, 2003, L. Getoor, T. E. Senator, P. Domingos, and C. Faloutsos, Eds. ACM, 2003, pp. 306-315.
[13] S. S. Keerthi, "Efficient tuning of SVM hyperparameters using radius/ margin bound and iterative algorithms," IEEE Transactions on Neural Networks, vol. 13, pp. 1225-1229, 2002.
[14] R. Collobert, S. Bengio, and Y. Bengio, "A parallel mixture of SVMs for very large scale problems," Neural Computation, vol. 14, no. 5, pp. 1105-1114, 2002.
[15] C. Hsu and C. Lin, "A comparison of methods for multi-class support vector machines," IEEE Transactions on Neural Networks, vol. 13, pp. 415-425, 2002.
[16] T. Eitrich and B. Lang, "Parallel tuning of support vector machine learning parameters for large and unbalanced data sets," in Computational Life Sciences, First International Symposium, CompLife 2005, Konstanz, Germany, ser. Lecture Notes in Computer Science, M. R. Berthold, R. Glen, K. Diederichs, O. Kohlbacher, and I. Fischer, Eds., vol. 3695. Springer, 2005, pp. 253-264.
[17] T. P. Runarsson and S. Sigurdsson, "Asynchronous parallel evolutionary model selection for support vector machines," Neural Information Processing - Letters and Reviews, vol. 3, no. 3, pp. 59-67, june 2004.
[18] S. Celis and D. R. Musicant, "Weka-parallel: machine learning in parallel," Carleton College, Computer Science Technical Report 2002b, 2002.
[19] J.-X. Dong, A. Krzyzak, and C. Y. Suen, "A fast parallel optimization for training support vector machines," in Proceedings of 3rd International Conference on Machine Learning and Data Mining, P. Perner and A. Rosenfeld, Eds., 2003, pp. 96-105.
[20] T. Serafini, G. Zanghirati, and L. Zanni, "Parallel decomposition approaches for training support vector machines," in Proceedings of the International Conference ParCo2003, Dresden, Germany. Elsevier, 2004, pp. 259-266.
[21] N. Cristianini and J. Shawe-Taylor, An introduction to support vector machines and other kernel-based learning methods. Cambridge, UK: Cambridge University Press, 2000.
[22] V. N. Vapnik, The nature of statistical learning theory. New York: Springer, 1995.
[23] B. Sch¨olkopf and A. J. Smola, Learning with kernels. Cambridge, MA: MIT Press, 2002.
[24] R. Herbrich, Learning kernel classifiers: theory and algorithms. Cambridge, MA, USA: MIT Press, 2001.
[25] T. Eitrich and B. Lang, "Efficient optimization of support vector machine learning parameters for unbalanced datasets," Journal of Computational and Applied Mathematics, 2005, in press.
[26] C.-C. Chang and C.-J. Lin, LIBSVM: a library for support vector machines, 2001, software available at http://www.csie.ntu.edu.tw/˜cjlin/libsvm.
[27] R. Collobert and S. Bengio, "SVMTorch: a support vector machine for large-scale regression and classification problems," Journal of Machine Learning Research, vol. 1, pp. 143-160, 2001. (Online). Available: http://www.idiap.ch/learning/SVMTorch.html
[28] T. Joachims, "SVM-light support vector machine," Webpage, Mar 2002, http://www.cs.cornell.edu/People/tj/svm light/, Version: 5.00, Date: 03.07.2002. (Online). Available: http://www.cs.cornell.edu/People/tj/svm light
[29] S. R├╝ping, "mySVM-manual," Webpage, 2000. (Online). Available: http://www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVM
[30] C. J. C. Burges, "A tutorial on support vector machines for pattern recognition," Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121-167, 1998.
[31] T. Eitrich and B. Lang, "Shared memory parallel support vector machine learning," Research Centre J¨ulich, Preprint FZJ-ZAM-IB-2005-11, September 2005, submitted for publication.
[32] P. Laskov, "Feasible direction decomposition algorithms for training support vector machines," Machine Learning, vol. 46, no. 1-3, pp. 315- 349, 2002.
[33] S. Leyffer, "The return of the active set method," 2005, to appear in Oberwolfach Report.
[34] T. Serafini, G. Zanghirati, and L. Zanni, "Gradient projection methods for quadratic programs and applications in training support vector machines," Optimization Methods and Software, vol. 20, no. 2-3, pp. 353-378, 2005.
[35] J. Platt, "Fast training of support vector machines using sequential minimal optimization," in Advances in Kernel Methods ÔÇö Support Vector Learning, B. Schölkopf, C. J. C. Burges, and A. J. Smola, Eds. Cambridge, MA: MIT Press, 1999, pp. 185-208.
[36] S. Hettich, C. L. Blake, and C. J. Merz, UCI repository of machine learning databases, 1998, http://www.ics.uci.edu/~mlearn/MLRepository.html.
[37] G. Zanghirati and L. Zanni, "A parallel solver for large quadratic programs in training support vector machines," Parallel Computing, vol. 29, no. 4, pp. 535-551, 2003.
[38] U. Detert, Introduction to the JUMP architecture, 2004. (Online). Available: http://jumpdoc.fz-juelich.de
[39] T. Joachims, "Making large-scale SVM learning practical," in Advances in Kernel Methods - Support Vector Learning, B. Sch¨olkopf, C. Burges, and A. Smola, Eds. MIT Press, 1998, pp. 169-185.
[40] O. Chapelle, V. N. Vapnik, O. Bousquet, and S. Mukherjee, "Choosing multiple parameters for support vector machines," Machine Learning, vol. 46, no. 1, pp. 131-159, 2002.