For what class of functions is steepest descent guaranteed to converge in finite number of iterations? Is there any classification of R^n rightarrow R functions with respect to finite convergence of steepest descent (and similar first-order methods, e.g., nonlinear conjugate gradients)?

mxty42ued

mxty42ued

Answered question

2022-11-16

For what class of functions is steepest descent guaranteed to converge in finite number of iterations?
Is there any classification of R n R functions with respect to finite convergence of steepest descent (and similar first-order methods, e.g., nonlinear conjugate gradients)?
By finite convergence I mean that algorithm reaches a point x k with | | f ( x k ) | | = 0 after a finite number of iterations, given that no round-off errors occur (and maybe assuming that line search is exact).
It is known that algorithms are globally convergent, i.e. l i m k | | f ( x k ) | | = 0, but there is no guarantee that it happens for finite k.
For example, conjugate gradient method is guaranteed to converge in n steps for positive definite quadratic functions. Are there any similar results for steepest descent?
What about (strongly) convex general functions? I was only able to find results of the type
k = O ( | | x 0 x | | 2 ϵ )
to achieve f ( x k ) f ( x ) ϵ, where f is smooth convex and x∗ is a stationary point.
As far as I understand, it does not imply finite convergence for ϵ = 0. Is it known that finite convergence does not happen even for strongly convex functions?

Answer & Explanation

Gilbert Petty

Gilbert Petty

Beginner2022-11-17Added 23 answers

Step 1
Steepest descent method:
x k + 1 := x k γ k f ( x k )
with
γ k := arg min γ 0 ϕ k ( γ )
and
ϕ k ( γ ) = f ( x k γ f ( x k ) )
This method also ends after finally many steps for quadratic functions f ( x ) := 1 2 x T A x b T x with A positive definite and symmetric with the following choice for γ k :
γ k := 1 λ k + 1 ( k = 0 , 1 , , n 1 )
with λ i being Eigenvalues of A.
Step 2
However, this is not used in practice because calculating the eigenvalues of A is usually computationally much harder than solving A x = b (solving A x = b corresponds to setting the gradient of f(x) zero and thus finding the critical points. Since A is positive definite a critical point will be optimal).

Do you have a similar question?

Recalculate according to your conditions!

Ask your question.
Get an expert answer.

Let our experts help you. Answer in as fast as 15 minutes.

Didn't find what you were looking for?