Ranking - Convex Risk Minimization
Authors: Wojciech Rejchel
Abstract:
The problem of ranking (rank regression) has become popular in the machine learning community. This theory relates to problems, in which one has to predict (guess) the order between objects on the basis of vectors describing their observed features. In many ranking algorithms a convex loss function is used instead of the 0-1 loss. It makes these procedures computationally efficient. Hence, convex risk minimizers and their statistical properties are investigated in this paper. Fast rates of convergence are obtained under conditions, that look similarly to the ones from the classification theory. Methods used in this paper come from the theory of U-processes as well as empirical processes.
Keywords: Convex loss function, empirical risk minimization, empirical process, U-process, boosting, euclidean family.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1063098
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1421References:
[1] S. Boucheron, O. Bousquet, G. Lugosi, Theory of classification: a survey and recent advances, Probability and Statistics, vol. 9, pp. 323-375, 2005.
[2] L. Devroye, L. Gyorfi, G. Lugosi, A probabilistic theory of pattern recognition. Springer-Verlag, New York, 1996.
[3] S. Clemenon, G. Lugosi, N. Vayatis, Ranking and empirical minimization of U-statistics, Ann. Statist., vol. 36, pp. 844-874, 2008.
[4] V. H. de la Pena, E. Gine, Decoupling: from dependence to independence. Springer-Verlag, New York, 1999.
[5] R. J. Serfling, Approximation theorems of mathematical statistics. Wiley, New York, 1980.
[6] J. Abrevaya, Computation of the maximum rank correlation estimator, Economics Letters, vol. 62, pp. 279-285, 1999.
[7] P. L. Bartlett, S. Ben-David, Hardness results for neural network approximation problems, Theoretical Computer Science, vol. 284, pp. 53-66, 2002.
[8] V. N. Vapnik, Statistical learning theory. John Wiley, New York, 1998.
[9] C. Cortes, V. N. Vapnik, Support vector networks, Machine Learning, vol. 20, pp. 273-297, 1995.
[10] R. E. Schapire, Y. Freund, P. L. Bartlett, W. S. Lee, Boosting the margin: a new explanation for the effectiveness of voting methods, Ann. Statist., vol. 26, pp. 1651-1686, 1998.
[11] Y. Freund, R. Iyer, R. E. Schapire, Y. Singer, An efficient boosting algorithm for combining preferences, J. Machine Learning Research, vol. 4, pp. 933-969, 2004.
[12] W. Niemiro, W. Rejchel, Rank correlation estimators and their limiting distributions, Statistical Papers, to be published.
[13] A. B. Tsybakov, Optimal aggregation of classifiers in statistical learning, Ann. Statist., vol. 32, pp. 135-166, 2004.
[14] M. A. Arcones, E. Gine (1993). Limit theorems for U-processes. Ann. Probab., vol. 21, pp. 1494-1542.
[15] P. Major, An estimate of the supremum of a nice class of stochastic integrals and U-statistics, Probab. Theory Related Fields, vol. 134, pp. 489-537, 2006.
[16] V. Koltchinskii, D. Panchenko, Empirical margin distributions and bounding the generalization error of combined classifiers, Ann. Statist., vol. 30, pp. 1-50, 2002.
[17] P. L. Bartlett, O. Bousquet, S. Mendelson, Local Rademacher complexities, Ann. Statist., vol. 33, pp. 1497-1537, 2005.
[18] P. L. Bartlett, M. I. Jordan, J. D. Mcauliffe, Convexity, Classification, and Risk Bounds, Journal of the American Statistical Association, vol. 101, pp. 138-156, 2006.
[19] P. Massart, Some applications of concentration inequalities to statistics, Probability theory. Ann. Fac. Sci. Toulouse Math., vol. 9 , pp. 245-303, 2000.
[20] P. Massart, Concentration Inequalities and Model Selection. Springer, Berlin, 2007.
[21] S. Mendelson, Improving the sample complexity using global data, IEEE Trans. Inform. Theory, vol. 48, pp. 1977-1991, 2002.
[22] A. Pakes, D. Pollard, Simulation and the asymptotics of optimization estimators, Econometrica, vol. 57, pp. 1027-1057, 1989.
[23] S. Mendelson, A few notes on statistical learning theory. Advanced Lectures on Machine Learning. Lecture Notes in Comput. Sci., pp. 1- 40. Springer, New York, 2003.
[24] D. Pollard, Asymptotics via Empirical Processes., Statist. Sci., vol. 4, pp. 365-366, 1989.
[25] M. A. Arcones, E. Gine, U-processes indexed by VapnikChervonenkis classes of functions with applications to asymptotics and bootstrap of Ustatistics with estimated parameters, Stochastic Process. Appl., vol. 52, pp. 17-38, 1994.
[26] D. Nolan, D. Pollard, U-processes: rates of convergence., Ann. Statist., vol. 15, pp. 780-799, 1987.