Ordinal Regression with Fenton-Wilkinson Order Statistics: A Case Study of an Orienteering Race

Joonas Pääkkönen

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33122

Ordinal Regression with Fenton-Wilkinson Order Statistics: A Case Study of an Orienteering Race

Authors: Joonas Pääkkönen

Abstract:

In sports, individuals and teams are typically interested in final rankings. Final results, such as times or distances, dictate these rankings, also known as places. Places can be further associated with ordered random variables, commonly referred to as order statistics. In this work, we introduce a simple, yet accurate order statistical ordinal regression function that predicts relay race places with changeover-times. We call this function the Fenton-Wilkinson Order Statistics model. This model is built on the following educated assumption: individual leg-times follow log-normal distributions. Moreover, our key idea is to utilize Fenton-Wilkinson approximations of changeover-times alongside an estimator for the total number of teams as in the notorious German tank problem. This original place regression function is sigmoidal and thus correctly predicts the existence of a small number of elite teams that significantly outperform the rest of the teams. Our model also describes how place increases linearly with changeover-time at the inflection point of the log-normal distribution function. With real-world data from Jukola 2019, a massive orienteering relay race, the model is shown to be highly accurate even when the size of the training set is only 5% of the whole data set. Numerical results also show that our model exhibits smaller place prediction root-mean-square-errors than linear regression, mord regression and Gaussian process regression.

Keywords: Fenton-Wilkinson approximation, German tank problem, log-normal distribution, order statistics, ordinal regression, orienteering, sports analytics, sports modeling.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 846

References:

[1] P. A. Gutierrez, M. Perez-Ortiz, J. Sanchez-Monedero, F. Fernandez-Navarro, and C. Hervas-Martinez, “Ordinal regression methods: Survey and experimental study,” IEEE Trans. Knowl. and Data Eng., vol. 28, no. 1, pp. 127–146, 2016.
[2] M. Raghu and E. Schmidt. (2020, March) A survey of deep learning for scientific discovery.
[Online]. Available: arXiv:2003.11755
[3] M. Strand and D. Boes, “Modeling road racing times of competitive recreational runners using extreme value theory,” Am. Stat., vol. 52, no. 3, pp. 205–210, 1998.
[4] H. Spearing, J. A. Tawn, D. B. Irons, T. Paulden, and G. A. Bennett. (2020, June) Ranking, and other properties, of elite swimmers using extreme value theory.
[Online]. Available: arXiv:1910.10070
[5] L. F. Fenton, “The sum of log-normal probability distibutions in scattered transmission systems,” IRE Trans. Commun. Syst., vol. 8, pp. 57–67, 1960.
[6] R. I. Wilkinson, “Unpublished, cited in 1967,” Bell Telephone Labs, 1934.
[7] B. R. Cobb, R. Rum´ı, and A. Salmer´on, “Approximating the distribution of a sum of log-normal random variables,” in Proc. 6th Eur. Workshop Probab. Graph. Models, 2012, pp. 67–74.
[8] S. Nadarajah, “Explicit expressions for moments of log normal order statistics,” Economic Quality Control, vol. 23, no. 2, pp. 267–279, 2008.
[9] E. T. Jaynes, “Information theory and statistical mechanics,” Phys. Rev., vol. 106, no. 4, pp. 620–630, 1957.
[10] E. J. Allen, P. M. Dechow, D. G. Pope, and G. Wu, “Reference-dependent preferences: Evidence from marathon runners,” Manag. Sci., vol. 63, no. 6, pp. 1657–2048, 2017.
[11] D. Ruiz-Mayo, E. Pulido, and G. Mart´ı˜noz, “Marathon performance prediction of amateur runners based on training session data,” in Proc. Mach. Learn. and Data Min. for Sports Anal., 2016.
[12] J. Esteve-Lanao, S. D. Rosso, E. Larumbe-Zabala, C. Cardona, A. Alcocer-Gamboa, and D. A. Boullosa, “Predicting recreational runners’ marathon performance time during their training preparation,” J. Strength Cond. Res. doi: 10.1519/JSC.0000000000003199
[Epub ahead of print], 2019.
[13] K. A. Wang, G. Pleiss, J. R. Gardner, S. Tyree, K. Q. Weinberger, and A. G. Wilson, “Exact gaussian processes on a million data points,” in Proc. Adv. Neural Inf. Process. Syst. 32, 2019, pp. 14 648–14 659.
[14] C. E. Rasmussen and C. K. I. Williams, “Gaussian processes for machine learning,” The MIT Press, 2006.
[15] Gpytorch regression tutorial.
[Online]. Available: https://gpytorch.readthedocs.io/en/latest/examples/01 Exact GPs/ Simple GP Regression.html
[16] Mord: Ordinal regression in python.
[Online]. Available: https: //pythonhosted.org/mord/
[17] F. Pedregosa-Izquierdo, “Feature extraction and supervised learning on fmri: from practice to theory,” Ph.D. dissertation, Universit´e Pierre-et-Marie-Curie, 2015.
[18] Jukola 2019.
[Online]. Available: https://results.jukola.com/tulokset/en/ j2019 ju/
[19] E. Limpert, W. A. Stahel, and M. Abbt, “Log-normal distributions across the sciences: Keys and clues,” Bioscience, vol. 51, pp. 341–352, 2001.
[20] P. Chen, R. Tong, G. Lu, and Y. Wang, “Exploring travel time distribution and variability patterns using probe vehicle data: Case study in beijing,” J. Adv. Transp., pp. 1–13, 2018.
[21] R. Ruggles and H. Brodie, “An empirical approach to economic intelligence in world war ii,” J. Am. Stat. Assoc., vol. 42, no. 237, pp. 72–91, 1947.
[22] L. A. Goodman, “Serial number analysis,” J. Am. Stat. Assoc., vol. 47, no. 270, pp. 622–634, 1952.