Department of Mathematics
 Search | Help | Login

Math @ Duke





.......................

.......................


Publications [#382943] of Rong Ge

Papers Published

  1. Zhou, M; Ge, R, How does Gradient Descent Learn Features - A Local Analysis for Regularized Two-Layer Neural Networks, Advances in Neural Information Processing Systems, vol. 37 (January, 2024)
    (last updated on 2026/01/18)

    Abstract:
    The ability of learning useful features is one of the major advantages of neural networks. Although recent works show that neural network can operate in a neural tangent kernel (NTK) regime that does not allow feature learning, many works also demonstrate the potential for neural networks to go beyond NTK regime and perform feature learning. Recently, a line of work highlighted the feature learning capabilities of the early stages of gradient-based training. In this paper we consider another mechanism for feature learning via gradient descent through a local convergence analysis. We show that once the loss is below a certain threshold, gradient descent with a carefully regularized objective will capture ground-truth directions. We further strengthen this local convergence analysis by incorporating early-stage feature learning analysis. Our results demonstrate that feature learning not only happens at the initial gradient steps, but can also occur towards the end of training.

 

dept@math.duke.edu
ph: 919.660.2800
fax: 919.660.2821

Mathematics Department
Duke University, Box 90320
Durham, NC 27708-0320


x