On stochastic optimization and the Adam optimizer: Divergence, convergence rates, and acceleration techniques
descent optimization methods for non-vanishing learning rates, arXiv:2407.08100 (2024), 54 pages. [3] T. Do, A. Jentzen, & A. Riekert, Non-convergence to the optimal risk for Adam and stochastic gradient …