# Gradient Descent Converges to Minimizers

@article{Lee2016GradientDC, title={Gradient Descent Converges to Minimizers}, author={J. Lee and Max Simchowitz and Michael I. Jordan and Benjamin Recht}, journal={ArXiv}, year={2016}, volume={abs/1602.04915} }

We show that gradient descent converges to a local minimizer, almost surely with random initialization. This is proved by applying the Stable Manifold Theorem from dynamical systems theory.

#### Topics from this paper

#### 185 Citations

Gradient Descent Converges to Minimizers: The Case of Non-Isolated Critical Points

- Computer Science, Mathematics
- ArXiv
- 2016

We prove that the set of initial conditions so that gradient descent converges to strict saddle points has (Lebesgue) measure zero, even for non-isolated critical points, answering an open question… Expand

Block Coordinate Descent Only Converge to Minimizers

- Mathematics
- 2017

Given a non-convex twice continuously differentiable cost function with Lipschitz continuous gradient, we prove that all of block coordinate gradient descent, block mirror descent and proximal block… Expand

Block Coordinate Descent Almost Surely Converges to a Stationary Point Satisfying the Second-order Necessary Condition

- 2017

Given a non-convex twice continuously differentiable cost function with Lipschitz continuous gradient, we prove that all of the block coordinate gradient descent, block mirror descent and proximal… Expand

Gradient Descent Learns Linear Dynamical Systems

- Computer Science, Mathematics
- J. Mach. Learn. Res.
- 2018

We prove that gradient descent efficiently converges to the global optimizer of the maximum likelihood objective of an unknown linear time-invariant dynamical system from a sequence of noisy… Expand

Correction: Beyond convexity—Contraction and global convergence of gradient descent

- Medicine
- PloS one
- 2020

This research presents a novel probabilistic procedure that allows for direct measurement of the response of the immune system to earthquake-triggered landsliding. Expand

Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions

- Mathematics, Computer Science
- ITCS
- 2017

It is proved that the set of initial conditions so that gradient descent converges to saddle points where f has at least one strictly negative eigenvalue has (Lebesgue) measure zero, even for cost functions f with non-isolated critical points, answering an open question in [12]. Expand

Certain Systems Arising in Stochastic Gradient Descent

- Mathematics
- 2019

CERTAIN SYSTEMS ARISING IN STOCHASTIC GRADIENT DESCENT Konstantinos Karatapanis Robin Pemantle Stochastic approximations is a rich branch of probability theory and has a wide range of application.… Expand

Competitive Gradient Descent

- Mathematics, Computer Science
- NeurIPS
- 2019

We introduce a new algorithm for the numerical computation of Nash equilibria of competitive two-player games. Our method is a natural generalization of gradient descent to the two-player setting… Expand

An Improved Adagrad Gradient Descent Optimization Algorithm

- Computer Science
- 2018 Chinese Automation Congress (CAC)
- 2018

The results show that the proposed improved Adagrad gradient descent optimization algorithm has a more stable convergence process and can reduce overfitting in multiple epochs. Expand

Asymptotic Escape of Spurious Critical Points on the Low-rank Matrix Manifold

- Mathematics, Computer Science
- ArXiv
- 2021

It is shown that using the dynamical low-rank approximation and a rescaled gradient flow, some of the spurious critical points can be converted to classical strict saddle points, which leads to the desired result. Expand

#### References

SHOWING 1-10 OF 44 REFERENCES

Cubic regularization of Newton method and its global performance

- Mathematics, Computer Science
- Math. Program.
- 2006

This paper provides theoretical analysis for a cubic regularization of Newton method as applied to unconstrained minimization problem and proves general local convergence results for this scheme. Expand

Global Stability of Dynamical Systems

- Mathematics
- 1986

1 Generalities.- 2 Filtrations.- 3 Sequences of Filtrations.- 4 Hyperbolic Sets.- 5 Stable Manifolds.- 6 Stable Manifolds for Hyperbolic Sets.- 7 More Consequences of Hyperbolicity.- 8 Stability.- 9… Expand

Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition

- Mathematics, Computer Science
- COLT
- 2015

This paper identifies strict saddle property for non-convex problem that allows for efficient optimization of orthogonal tensor decomposition, and shows that stochastic gradient descent converges to a local minimum in a polynomial number of iterations. Expand

Convergence of the Iterates of Descent Methods for Analytic Cost Functions

- Computer Science, Mathematics
- SIAM J. Optim.
- 2005

It is shown that the iterates of numerical descent algorithms, for an analytic cost function, share this convergence property if they satisfy certain natural descent conditions and strengthen classical "weak convergence" results for descent methods to "strong limit-point convergence" for a large class of cost functions of practical interest. Expand

Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods

- Mathematics, Computer Science
- Math. Program.
- 2013

This work proves an abstract convergence result for descent methods satisfying a sufficient-decrease assumption, and allowing a relative error tolerance, that guarantees the convergence of bounded sequences under the assumption that the function f satisfies the Kurdyka–Łojasiewicz inequality. Expand

On the saddle point problem for non-convex optimization

- Computer Science
- ArXiv
- 2014

It is argued, based on results from statistical physics, random matrix theory, and neural network theory, that a deeper and more profound difficulty originates from the proliferation of saddle points, not local minima, especially in high dimensional problems of practical interest. Expand

Newton-type methods for unconstrained and linearly constrained optimization

- Mathematics, Computer Science
- Math. Program.
- 1974

The methods are intimately based on the recurrence of matrix factorizations and are linked to earlier work on quasi-Newton methods and quadratic programming. Expand

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

- Computer Science, Mathematics
- NIPS
- 2014

This paper proposes a new approach to second-order optimization, the saddle-free Newton method, that can rapidly escape high dimensional saddle points, unlike gradient descent and quasi-Newton methods, and applies this algorithm to deep or recurrent neural network training, and provides numerical evidence for its superior optimization performance. Expand

Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality

- Mathematics, Computer Science
- Math. Oper. Res.
- 2010

A convergent proximal reweighted l1 algorithm for compressive sensing and an application to rank reduction problems is provided, which depends on the geometrical properties of the function L around its critical points. Expand

On the use of directions of negative curvature in a modified newton method

- Mathematics, Computer Science
- Math. Program.
- 1979

A modified Newton method for the unconstrained minimization problem is presented and it is shown how the Bunch and Parlett decomposition of a symmetric indefinite matrix can be used to give entirely adequate directions of negative curvature. Expand