|
Math @ Duke
|
Publications [#378737] of Rong Ge
Papers Published
- Zhu, X; Wang, Z; Wang, X; Zhou, M; Ge, R, UNDERSTANDING EDGE-OF-STABILITY TRAINING DYNAMICS WITH A MINIMALIST EXAMPLE,
11th International Conference on Learning Representations Iclr 2023
(January, 2023)
(last updated on 2026/01/15)
Abstract: Recently, researchers observed that gradient descent for deep neural networks operates in an “edge-of-stability” (EoS) regime: the sharpness (maximum eigenvalue of the Hessian) is often larger than stability threshold 2/η (where η is the step size). Despite this, the loss oscillates and converges in the long run, and the sharpness at the end is just slightly below 2/η. While many other well-understood nonconvex objectives such as matrix factorization or two-layer networks can also converge despite large sharpness, there is often a larger gap between sharpness of the endpoint and 2/η. In this paper, we study EoS phenomenon by constructing a simple function that has the same behavior. We give rigorous analysis for its training dynamics in a large local region and explain why the final converging point has sharpness close to 2/η. Globally we observe that the training dynamics for our example have an interesting bifurcating behavior, which was also observed in the training of neural nets.
|
|
|
|
dept@math.duke.edu
ph: 919.660.2800
fax: 919.660.2821
| |
Mathematics Department
Duke University, Box 90320
Durham, NC 27708-0320
|
|