Convergence
Iterative algorithms in which as each new point is generated by the algorithm, the corresponding value of the objective function decreases in value.
i.e. The algorithm possesses the descent property.
An iterative algorithm is globally convergent if for any arbitrary starting point the algorithm is guaranteed to generate a sequence of points converging to a point that satisfies the FONC for a minimizer.
When the algorithm is not globally convergent, it may still generate a sequence that converges to a point satisfying the FONC, provided the initial point is sufficiently close to the point. In this case we say that the algorithm is locally convergent.
How fast the algorithm converges to a solution point.
$$V(\b{x}) = f(\b{x}) + \frac{ 1 }{ 2 } \b{ x }^{*\text{T}} \b{Q} \b{ x^*} = \frac{ 1 }{ 2 } \transpose{(\b{x} - \b{x}^*)} \b{Q} (\b{x} - \b{x}^*) $$
The iterative algorithm
$$\b{x}^{(k+1)} = \b{x}^{(k)} \alpha_k \b{g}^{(k)} $$
with $\b{g}^{(k)} = \b{Q}\b{x}^{(k)} - \b{b}$ satisfies
$$V(\b{x}^{(k+1)}) = (1 - \gamma_k) V(\b{x}^{(k)})$$
where, if $\b{g}^{(k)} = 0$ then $\gamma_k = 1 $, and if $\b{g}^{(k)} \neq 0 $ then
$$\gamma_k = \alpha_k \frac{ \transpose{{g^{(k)}}} \b{Q} \b{g}^{(k)}}{ \transpose{{\b{g}^{(k)}}} \inv{ \b{Q}} \b{g}^{(k)}} \pare{2 \frac{ \transpose{{\b{g}^{(k)}}} \b{g}^{(k)}}{ \transpose{{\b{g}^{(k)}}} \b{Q} \b{g}^{(k)}} - \alpha_k} $$
let $\set{ \b{x}^{(k)}} $ be the sequence resulting from a gradient algorithm $\b{x}^{(k+1)} = \b{x}^{(k)} - \alpha_k \b{g}^{(k)} $. Let $\gamma_k $ be as defined in the previous lemma, and suppose that $\gamma_k > 0 $ for all $k $. Then, $\set{ \b{x}^{(k)}} $ converges to $\b{x}^{*} $ for any initial condition $\b{x}^{(0)}$ if and only if
$$\sum_{ k =0 }^{ \infty } \gamma_k = \infty $$
Let $\b{Q} = \transpose{ \b{Q}} > 0 $ be an $n \times n $ real symmetric positive definite matrix. Then, for any $\b{x} \in \R^n $, we have
$$\frac{ \lambda_{ \text{min}}(\b{Q})}{ \lambda_{ \text{max}}(\b{Q})} \leq \frac{(\transpose{ \b{x}} \b{x})^2 }{(\b{x}^\text{T} \b{Q} \b{x})(\b{x}^\text{T} \inv{ \b{Q}} \b{x})} \leq \frac{ \lambda_\text{max}(\b{Q})}{ \lambda_\text{min}(\b{Q})}$$
In the steepest descent algorithm, we have $\b{x}^{(k)} \to x^* $ for any $\b{x}^{(0)} $.
For the fixed step size gradient algorithm, $\b{x}^{(k)} \to \b{x}^*$ for any $\b{x}^{(0)} $ if and only if
$$0 < \alpha < \frac{ 2 }{ \lambda_{\text{max}}(\b{Q})} $$.
Convergence Rate
In the method of steepest descent applied to the quadratic function, at every step $k$, we have
$$ V(\b{x}^{(k + 1)}) \leq \pare{ \frac{ \lambda_\text{max}(\b{Q}) - \lambda_\text{min}(\b{Q})}{ \lambda_\text{max}(\b{Q})}} V(\b{x}^{(k)}) $$
Sorry, the note for this chapter stops here. I couldn’t figure out where $V(x)$ comes from, and why it is important. I will probably come back to this chapter in the future.