Efficient Implementation of Serial and Parallel Support Vector Machine Training with a Multi-Parameter Kernel for Large-Scale Data Mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 33123
Efficient Implementation of Serial and Parallel Support Vector Machine Training with a Multi-Parameter Kernel for Large-Scale Data Mining

Authors: Tatjana Eitrich, Bruno Lang

Abstract:

This work deals with aspects of support vector learning for large-scale data mining tasks. Based on a decomposition algorithm that can be run in serial and parallel mode we introduce a data transformation that allows for the usage of an expensive generalized kernel without additional costs. In order to speed up the decomposition algorithm we analyze the problem of working set selection for large data sets and analyze the influence of the working set sizes onto the scalability of the parallel decomposition scheme. Our modifications and settings lead to improvement of support vector learning performance and thus allow using extensive parameter search methods to optimize classification accuracy.

Keywords: Support Vector Machines, Shared Memory Parallel Computing, Large Data

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1082137

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1580

References:


[1] T. Joachims, "Text categorization with support vector machines: learning with many relevant features," in Proceedings of ECML-98. Chemnitz: Springer, 1998, pp. 137-142.
[2] V. N. Vapnik, Statistical learning theory. New York: John Wiley & Sons, 1998.
[3] C.-J. Lin, "On the convergence of the decomposition method for support vector machines," IEEE Transactions on Neural Networks, vol. 12, no. 6, pp. 1288-1298, 2001.
[4] M. Momma and K. P. Bennett, "A pattern search method for model selection of support vector regression," in Proc. of the 2nd SIAM Int. Conf. on Data Mining, Arlington, 2002. SIAM, 2002.
[5] H. Nunez, C. Angulo, and A. Catala, "Support vector machines with symbolic interpretation," VII Simpsio Brasileiro de Redes Neurais - Recife, PE, Brasil, pp. 142-147, 2002. (Online). Available: http://doi.ieeecomputersociety.org/10.1109/SBRN.2002.1181456
[6] H. P. Graf, E. Cosatto, L. Bottou, I. Dourdanovic, and V. Vapnik, "Parallel support vector machines: the cascade SVM," in Advances in Neural Information Processing Systems 17. Cambridge, MA: MIT Press, 2005, pp. 521-528.
[7] V. Ruggiero and L. Zanni, "On the efficiency of splitting and projection methods for large strictly convex quadratic programs," Applied Optimization, pp. 401-413, 1999.
[8] S. S. Keerthi, "Efficient tuning of SVM hyperparameters using radius/ margin bound and iterative algorithms," IEEE Transactions on Neural Networks, vol. 13, pp. 1225-1229, 2002.
[9] R. Collobert, S. Bengio, and Y. Bengio, "A parallel mixture of SVMs for very large scale problems," Neural Computation, vol. 14, no. 5, pp. 1105-1114, 2002.
[10] T. Eitrich and B. Lang, "Parallel tuning of support vector machine learning parameters for large and unbalanced data sets," in CompLife 2005 Symposium, Konstanz, ser. LNCS, vol. 3695. Springer, 2005, pp. 253-264.
[11] S. Celis and D. R. Musicant, "Weka-parallel: machine learning in parallel," Carleton College, CS TR 2002b, 2002.
[12] J.-X. Dong, A. Krzyzak, and C. Y. Suen, "A fast parallel optimization for training support vector machines," in Proc. of 3rd Int. Conf. on Machine Learning and Data Mining, 2003, pp. 96-105.
[13] V. N. Vapnik, The nature of statistical learning theory. NewYork: Springer, 1995.
[14] B. Sch¨olkopf and A. J. Smola, Learning with kernels. Cambridge, MA: MIT Press, 2002.
[15] R. Herbrich, Learning kernel classifiers: theory and algorithms. Cambridge, MA, USA: MIT Press, 2001.
[16] T. Eitrich and B. Lang, "Efficient optimization of support vector machine learning parameters for unbalanced datasets," Journal of Computational and Applied Mathematics, 2005, in press.
[17] C. J. C. Burges, "A tutorial on support vector machines for pattern recognition," Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121-167, 1998.
[18] C.-C. Chang and C.-J. Lin, LIBSVM: a library for support vector machines, 2001.
[19] R. Collobert and S. Bengio, "SVMTorch: a support vector machine for large-scale regression and classification problems," Journal of Machine Learning Research, vol. 1, pp. 143-160, 2001. (Online). Available: http://www.idiap.ch/learning/SVMTorch.html
[20] T. Joachims, "SVM-light support vector machine," Webpage, 2002.
[21] T. Eitrich and B. Lang, "Shared memory parallel support vector machine learning," Research Centre J¨ ulich, Preprint FZJ-ZAM-IB-2005-11, September 2005, submitted for publication.
[22] P. Laskov, "Feasible direction decomposition algorithms for training support vector machines," Machine Learning, vol. 46, no. 1-3, pp. 315- 349, 2002.
[23] S. Leyffer, "The return of the active set method," 2005, to apper.
[24] T. Serafini, G. Zanghirati, and L. Zanni, "Gradient projection methods for quadratic programs and applications in training support vector machines," Optimization Methods and Software, vol. 20, no. 2-3, pp. 353-378, 2005.
[25] J. Platt, "Fast training of support vector machines using sequential minimal optimization," in Advances in Kernel Methods ÔÇö Support Vector Learning. Cambridge, MA: MIT Press, 1999, pp. 185-208.
[26] S. Hettich, C. L. Blake, and C. J. Merz, UCI repository of machine learning databases, 1998, http://www.ics.uci.edu/Ôê╝mlearn/MLRepository.html.
[27] G. Zanghirati and L. Zanni, "A parallel solver for large quadratic programs in training support vector machines," Parallel Computing, vol. 29, no. 4, pp. 535-551, 2003.
[28] U. Detert, Introduction to the JUMP architecture, 2004. (Online). Available: http://jumpdoc.fz-juelich.de
[29] T. Joachims, "Making large-scale SVM learning practical," in Advances in Kernel Methods - Support Vector Learning. MIT Press, 1998, pp. 169-185.
[30] N. Cristianini and J. Shawe-Taylor, An introduction to support vector machines and other kernel-based learning methods. Cambridge, UK: Cambridge University Press, 2000.