On a Conjecture Regarding the Adam Optimizer

Mohamed Akrout; Douglas Tweed

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32799

On a Conjecture Regarding the Adam Optimizer

Authors: Mohamed Akrout, Douglas Tweed

Abstract:

The great success of deep learning relies on efficient optimizers, which are the algorithms that decide how to adjust network weights and biases based on gradient information. One of the most effective and widely used optimizers in recent years has been the method of adaptive moments, or Adam, but the mathematical reasons behind its effectiveness are still unclear. Attempts to analyse its behaviour have remained incomplete, in part because they hinge on a conjecture which has never been proven, regarding ratios of powers of the first and second moments of the gradient. Here we show that this conjecture is in fact false, but that a modified version of it is true, and can take its place in analyses of Adam.

Keywords: Adam optimizer, Bock’s conjecture, stochastic optimization, average regret.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 295

References:

[1] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
[2] Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th international conference on machine learning (icml-03), pages 928–936, 2003.
[3] David Mart´ınez Rubio. Convergence analysis of an adaptive method of gradient descent. University of Oxford, Oxford, M. Sc. thesis, 2017.
[4] Sebastian Bock, Josef Goppold, and Martin Weiß. An improvement of the convergence proof of the adam-optimizer. arXiv preprint arXiv:1804.10587, 2018.
[5] Sashank J Reddi, Satyen Kale, and Sanjiv Kumar. On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237, 2019.
[6] Sebastian Bock and Martin Weiß. Non-convergence and limit cycles in the adam optimizer. In International Conference on Artificial Neural Networks, pages 232–243. Springer, 2019.
[7] Sebastian Bock and Martin Georg Weiß. Local convergence of adaptive gradient descent optimizers. arXiv preprint arXiv:2102.09804, 2021.