Fast Approximate Bayesian Contextual Cold Start Learning (FAB-COST)

Jack R. McKenzie; Peter A. Appleby; Thomas House; Neil Walton

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32797

Fast Approximate Bayesian Contextual Cold Start Learning (FAB-COST)

Authors: Jack R. McKenzie, Peter A. Appleby, Thomas House, Neil Walton

Abstract:

Cold-start is a notoriously difficult problem which can occur in recommendation systems, and arises when there is insufficient information to draw inferences for users or items. To address this challenge, a contextual bandit algorithm – the Fast Approximate Bayesian Contextual Cold Start Learning algorithm (FAB-COST) – is proposed, which is designed to provide improved accuracy compared to the traditionally used Laplace approximation in the logistic contextual bandit, while controlling both algorithmic complexity and computational cost. To this end, FAB-COST uses a combination of two moment projection variational methods: Expectation Propagation (EP), which performs well at the cold start, but becomes slow as the amount of data increases; and Assumed Density Filtering (ADF), which has slower growth of computational cost with data size but requires more data to obtain an acceptable level of accuracy. By switching from EP to ADF when the dataset becomes large, it is able to exploit their complementary strengths. The empirical justification for FAB-COST is presented, and systematically compared to other approaches on simulated data. In a benchmark against the Laplace approximation on real data consisting of over 670, 000 impressions from autotrader.co.uk, FAB-COST demonstrates at one point increase of over 16% in user clicks. On the basis of these results, it is argued that FAB-COST is likely to be an attractive approach to cold-start recommendation systems in a variety of contexts.

Keywords: Cold-start, expectation propagation, multi-armed bandits, Thompson sampling, variational inference.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 485

References:

[1] W. Chen, Y. Wang, and Y. Yuan, “Combinatorial multi-armed bandit: General framework and applications,” in International Conference on Machine Learning, 2013, pp. 151–159.
[2] T. Lattimore and C. Szepesv´ari, “Bandit algorithms,” preprint, 2018.
[3] W. R. Thompson, “On the likelihood that one unknown probability exceeds another in view of the evidence of two samples,” Biometrika, vol. 25, no. 3/4, pp. 285–294, 1933.
[4] O. Chapelle and L. Li, “An empirical evaluation of thompson sampling,” in Advances in neural information processing systems, 2011, pp. 2249–2257.
[5] J. L. Doob, “Application of the theory of martingales,” Le calcul des probabilites et ses applications, pp. 23–27, 1949.
[6] T. P. Minka, “Expectation propagation for approximate bayesian inference,” in Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., 2001, pp. 362–369.
[7] M. Opper and O. Winther, “Gaussian processes for classification: Mean-field algorithms,” Neural computation, vol. 12, no. 11, pp. 2655–2684, 2000.
[8] ——, “A bayesian approach to on-line learning,” On-line learning in neural networks, pp. 363–378, 1998.
[9] S. Brooks, A. Gelman, G. L. Jones, and X.-L. Meng, Eds. New York: Chapman and Hall/CRC.
[10] D. W. Hosmer and S. Lemeshow, Applied Logistic Regression, ser. Applied Logistic Regression. John Wiley & Sons, 2004.
[11] T. Hastie, R. Tibshirani, and J. Friedman., The Elements of Statistical Learning: Data Mining, Inference, and Prediction., 2nd ed. New York: Springer-Verlag, 2009.
[12] C. M. Bishop, Pattern recognition and machine learning. springer, 2006.
[13] M. F. Dacrema, P. Cremonesi, and D. Jannach, “Are we really making much progress? a worrying analysis of recent neural recommendation approaches,” in Proceedings of the 13th ACM Conference on Recommender Systems. ACM, 2019, pp. 101–109.
[14] A. DasGupta, Asymptotic theory of statistics and probability. Springer Science & Business Media, 2008.
[15] J. D. Murray, Asymptotic Analysis. New York: Springer, 2012.
[16] S. Kullback and R. A. Leibler, “On information and sufficiency,” The annals of mathematical statistics, vol. 22, no. 1, pp. 79–86, 1951.
[17] D. M. Blei, A. Kucukelbir, and J. D. McAuliffe, “Variational inference: A review for statisticians,” Journal of the American Statistical Association, vol. 112, no. 518, pp. 859–877, 2017.
[18] J. M. Bernardo, “Approximations in statistics from a decision-theoretical viewpoint,” in Probability and Bayesian statistics. Springer, 1987, pp. 53–60.
[19] J. Pearl, “Fusion, propagation, and structuring in belief networks,” Artificial intelligence, vol. 29, no. 3, pp. 241–288, 1986.
[20] M. Seeger, “Expectation propagation for exponential families,” Tech. Rep., 2005.
[21] D. Russo and B. Van Roy, “Learning to optimize via posterior sampling,” Mathematics of Operations Research, vol. 39, no. 4, pp. 1221–1243, 2014.
[22] S. Agrawal and N. Goyal, “Thompson sampling for contextual bandits with linear payoffs,” in International Conference on Machine Learning, 2013, pp. 127–135.
[23] M. M´ezard, G. Parisi, and M. Virasoro, “Random free energies in spin glasses,” Journal de Physique Lettres, vol. 46, no. 6, pp. 217–222, 1985.
[24] M. W. Seeger, “Bayesian inference and optimal design for the sparse linear model,” Journal of Machine Learning Research, vol. 9, no. Apr, pp. 759–813, 2008.
[25] A. Gelman, A. Vehtari, P. Jyl¨anki, C. Robert, N. Chopin, and J. P. Cunningham, “Expectation propagation as a way of life,” arXiv preprint arXiv:1412.4869, vol. 157, 2014.
[26] G. Dehaene and S. Barthelm´e, “Expectation propagation in the large data limit,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 80, no. 1, pp. 199–217, 2018.
[27] G. P. Dehaene and S. Barthelm´e, “Bounding errors of expectation-propagation,” in Advances in Neural Information Processing Systems, 2015, pp. 244–252.
[28] R. Herbrich, “Minimising the kullback–leibler divergence,” Microsoft, Tech. Rep., 2005.
[29] D. J. MacKay, “The evidence framework applied to classification networks,” Neural computation, vol. 4, no. 5, pp. 720–736, 1992.
[30] J. Salvatier, T. V. Wiecki, and C. Fonnesbeck, “Probabilistic programming in python using PyMC3,” PeerJ Computer Science, vol. 2, p. e55, 2016.