$\newcommand{\br}{\\}$ $\newcommand{\R}{\mathbb{R}}$ $\newcommand{\Q}{\mathbb{Q}}$ $\newcommand{\Z}{\mathbb{Z}}$ $\newcommand{\N}{\mathbb{N}}$ $\newcommand{\C}{\mathbb{C}}$ $\newcommand{\P}{\mathbb{P}}$ $\newcommand{\F}{\mathbb{F}}$ $\newcommand{\L}{\mathcal{L}}$ $\newcommand{\spa}[1]{\text{span}(#1)}$ $\newcommand{\dist}[1]{\text{dist}(#1)}$ $\newcommand{\max}[1]{\text{max}(#1)}$ $\newcommand{\min}[1]{\text{min}(#1)}$ $\newcommand{\supr}[1]{\text{sup}(#1)}$ $\newcommand{\infi}[1]{\text{inf}(#1)}$ $\newcommand{\set}[1]{\left\{#1\right\}}$ $\newcommand{\emptyset}{\varnothing}$ $\newcommand{\otherwise}{\text{ otherwise }}$ $\newcommand{\if}{\text{ if }}$ $\newcommand{\proj}{\text{proj}}$ $\newcommand{\union}{\cup}$ $\newcommand{\intercept}{\cap}$ $\newcommand{\abs}[1]{\left| #1 \right|}$ $\newcommand{\norm}[1]{\left\lVert#1\right\rVert}$ $\newcommand{\pare}[1]{\left(#1\right)}$ $\newcommand{\t}[1]{\text{ #1 }}$ $\newcommand{\head}{\text H}$ $\newcommand{\tail}{\text T}$ $\newcommand{\d}{\text d}$ $\newcommand{\limu}[2]{\underset{#1 \to #2}\lim}$ $\newcommand{\der}[2]{\frac{\d #1}{\d #2}}$ $\newcommand{\derw}[2]{\frac{\d #1^2}{\d^2 #2}}$ $\newcommand{\pder}[2]{\frac{\partial #1}{\partial #2}}$ $\newcommand{\pderw}[2]{\frac{\partial^2 #1}{\partial #2^2}}$ $\newcommand{\pderws}[3]{\frac{\partial^2 #1}{\partial #2 \partial #3}}$ $\newcommand{\inv}[1]{{#1}^{-1}}$ $\newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $\newcommand{\nullity}[1]{\text{nullity}(#1)}$ $\newcommand{\rank}[1]{\text{rank }#1}$ $\newcommand{\var}[1]{\text{var}(#1)}$ $\newcommand{\tr}[1]{\text{tr}(#1)}$ $\newcommand{\oto}{\text{ one-to-one }}$ $\newcommand{\ot}{\text{ onto }}$ $\newcommand{\ceil}[1]{\lceil#1\rceil}$ $\newcommand{\floor}[1]{\lfloor#1\rfloor}$ $\newcommand{\Re}[1]{\text{Re}(#1)}$ $\newcommand{\Im}[1]{\text{Im}(#1)}$ $\newcommand{\dom}[1]{\text{dom}(#1)}$ $\newcommand{\fnext}[1]{\overset{\sim}{#1}}$ $\newcommand{\transpose}[1]{#1^{\text{T}}}$ $\newcommand{\b}[1]{\boldsymbol{#1}}$ $\newcommand{\None}[1]{}$ $\newcommand{\Vcw}[2]{\begin{bmatrix} #1 \br #2 \end{bmatrix}}$ $\newcommand{\Vce}[3]{\begin{bmatrix} #1 \br #2 \br #3 \end{bmatrix}}$ $\newcommand{\Vcr}[4]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \end{bmatrix}}$ $\newcommand{\Vct}[5]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \br #5 \end{bmatrix}}$ $\newcommand{\Vcy}[6]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \br #5 \br #6 \end{bmatrix}}$ $\newcommand{\Vcu}[7]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \br #5 \br #6 \br #7 \end{bmatrix}}$ $\newcommand{\vcw}[2]{\begin{matrix} #1 \br #2 \end{matrix}}$ $\newcommand{\vce}[3]{\begin{matrix} #1 \br #2 \br #3 \end{matrix}}$ $\newcommand{\vcr}[4]{\begin{matrix} #1 \br #2 \br #3 \br #4 \end{matrix}}$ $\newcommand{\vct}[5]{\begin{matrix} #1 \br #2 \br #3 \br #4 \br #5 \end{matrix}}$ $\newcommand{\vcy}[6]{\begin{matrix} #1 \br #2 \br #3 \br #4 \br #5 \br #6 \end{matrix}}$ $\newcommand{\vcu}[7]{\begin{matrix} #1 \br #2 \br #3 \br #4 \br #5 \br #6 \br #7 \end{matrix}}$ $\newcommand{\Mqw}[2]{\begin{bmatrix} #1 & #2 \end{bmatrix}}$ $\newcommand{\Mqe}[3]{\begin{bmatrix} #1 & #2 & #3 \end{bmatrix}}$ $\newcommand{\Mqr}[4]{\begin{bmatrix} #1 & #2 & #3 & #4 \end{bmatrix}}$ $\newcommand{\Mqt}[5]{\begin{bmatrix} #1 & #2 & #3 & #4 & #5 \end{bmatrix}}$ $\newcommand{\Mwq}[2]{\begin{bmatrix} #1 \br #2 \end{bmatrix}}$ $\newcommand{\Meq}[3]{\begin{bmatrix} #1 \br #2 \br #3 \end{bmatrix}}$ $\newcommand{\Mrq}[4]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \end{bmatrix}}$ $\newcommand{\Mtq}[5]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \br #5 \end{bmatrix}}$ $\newcommand{\Mqw}[2]{\begin{bmatrix} #1 & #2 \end{bmatrix}}$ $\newcommand{\Mwq}[2]{\begin{bmatrix} #1 \br #2 \end{bmatrix}}$ $\newcommand{\Mww}[4]{\begin{bmatrix} #1 & #2 \br #3 & #4 \end{bmatrix}}$ $\newcommand{\Mqe}[3]{\begin{bmatrix} #1 & #2 & #3 \end{bmatrix}}$ $\newcommand{\Meq}[3]{\begin{bmatrix} #1 \br #2 \br #3 \end{bmatrix}}$ $\newcommand{\Mwe}[6]{\begin{bmatrix} #1 & #2 & #3\br #4 & #5 & #6 \end{bmatrix}}$ $\newcommand{\Mew}[6]{\begin{bmatrix} #1 & #2 \br #3 & #4 \br #5 & #6 \end{bmatrix}}$ $\newcommand{\Mee}[9]{\begin{bmatrix} #1 & #2 & #3 \br #4 & #5 & #6 \br #7 & #8 & #9 \end{bmatrix}}$
Definition: The First-Order Derivative of $\None{}f:\R^n\to\R$

The first-order derivative of $f $ denoted $Df $ is

$$Df \triangleq \left[\pder{ f }{ x_1 }, \pder{ f }{ x_2 }, … , \pder{ f }{ x_n }\right]$$

Note that the gradient $\nabla f $ is just the transpose of $Df $; that is

$$\nabla f = \transpose{ Df } $$

Definition: Hessian (The Second-Order Derivative) of $\None{}f:\R^n\to\R$

$$\b{F}(\b{x}) = D^2 \triangleq {\Mee{ \pderw{ f }{ x_1 } (\b{x})}{ … }{ \pderws{ f }{ x_n }{ x_1 } (\b{x})}{ \vdots }{ }{ \vdots }{ \pderws{ f }{ x_1 }{ x_n } (\b{x})}{ … }{ \pderw{ f }{ x_n } (\b{x})}}$$

Definition: Feasible Direction

A vector $\b{d} \in \R^n, \b{d} \neq \b{0} $, is a feasible direction at $\b{x} \in \Omega$ if

$$\exists \alpha_0 > 0\text{ such that }\forall \alpha \in [0, \alpha_0], \pare{\b{x} + \alpha \b{d}} \in \Omega$$

Definition: Directional Derivative

Let $f: \R^n \to \R $ be a real-valued function and let $\b{d} $ be a feasible direction at $\b{x} \in \Omega $. The directional derivative of $f$ in the direction $\b{d} $, denoted $\pder{ f }{ \b{d}} $, is the real-valued function defined by

$$\pder{ f }{ \b{d}} (\b{x}) \triangleq \limu{ \alpha }{ 0 } \frac{ f(\b{x} + \alpha \b{d}) - f(\b{x})}{ \alpha }$$

In particular, if $\norm{ \b{d}} = 1$, then $\pder{ f }{ \b{d}} $ is the rate of increase of $f $ at $\b{x} $ in the direction $\b{d} $.

To compute the above derivative, suppose that $\b{x} $ and $\b{d} $ are given. Then $f(x + \alpha \b{d}) $ is a function of $\alpha $, and

$$\pder{ f }{ \b{d}} (\b{x}) = \der{ }{ \alpha } f(\b{x} + \alpha \b{d}){\biggr\rvert}_{ \alpha = 0 } = \nabla \transpose{ f(\b{x})} \b{d} = \inner{ \nabla f(x)}{\b{d}} = \transpose{ \b{d}} \nabla f(\b{x})$$

In summary, if $\b{d} $ is a unit vector, then $\inner{ \nabla f(x)}{\b{d}} $ is the rate of increase of $f $ at the point $\b{x} $ in the direction $\b{d} $.

Theorem : First-Order Necessary Condition (FONC)

$\Omega \subseteq \R^n, f \in \mathcal{C}^1(\Omega)$.

If $\b{x}^{*}$ is a local minimizer of $f$ over $\Omega$, then for any feasible direction $\b{d}$ at $\b{x}^{*}$, we have

$$\inner{ \nabla f(\b{x}^{*})}{ \b{d}} = \transpose{ \b{d}} \nabla f(\b{x}^*) \geq 0$$

Proof
  </span>
</span>
<span class="proof__expand"><a>[expand]</a></span>

Define

$$\b{x}(\alpha) = \b{x}^{*} + \alpha \b{d} \in \Omega $$

Note that $\b{x}(0) = \b{x}^* $. Define the composite function

$$\phi(\alpha) = f(\b{x}(\alpha))$$

Then, by Taylor’s theorem,

$$\begin{align*} &f(x^* + \alpha \b{d}) - f(x^*) \br =& \phi(\alpha) - \phi(0) \br =& \phi'(0) \alpha + o(\alpha) \tag{Taylor’s theorem}\br =& \alpha \transpose{ \b{d}} \nabla f(\b{x} (0)) + o(\alpha) \end{align*}$$

where $\alpha \geq 0 $.

Thus, if $\phi(\alpha) \geq \phi (0) $, that is $f(\b{x}^* + \alpha \b{d}) \geq f(\b{x}^*)$ for sufficiently small values of $\alpha > 0 $ ($\b{x}^*$ is a local minimizer), then we have to have $\transpose{ \b{d}} \nabla f(\b{x} ^*) \geq 0 $.

Corollary : Interior Case of FONC

$\Omega \subseteq \R^n, f \in \mathcal{C}^1(\Omega)$.

If $\b{x}^* $ is a local minimizer of $f $ over $\Omega $ and if $\b{x}^* $ is an interior point of $\Omega $, then

$$\nabla f(\b{x}^*) = \b{0} $$

Proof
  </span>
</span>
<span class="proof__expand"><a>[expand]</a></span>

Suppose that $f $ has a local minimizer $\b{x}^* $ that is an interior point of $\Omega $.

Because $\b{x}^* $ is an interior point of $\Omega $, the set of feasible directions at $\b{x}^* $ is the whole of $\R^n $.

Thus, for any $\b{d} \in \R^n$, $\transpose{ \b{d}} \nabla f(\b{x}^*) \geq 0$ and $-\transpose{ \b{d}} \nabla f(\b{x}^*) \geq 0$.

Hence, $\transpose{ \b{d}} f(\b{x}^*) = 0 $ for all $\b{d} \in \R^n $, which implies that $\nabla f(x ^*) = \b{0}$.

Theorem : Second-Order Necessary Condition (SONC)

Let $\Omega \subseteq \R^n $, $f \in \mathcal{C}^2(\Omega) $, $\b{x}^* $ a local minimizer of $f $ over $\Omega $, and $\b{d} $ a feasible direction at $\b{x}^* $. If $\transpose{ \b{d}} f(\b{x}^*) = 0 $, then

$$\transpose{ \b{d}} \b{F}(\b{x}^*) \b{d} \geq 0$$

where $\b{F} $ is the Hessian of $f$.

Proof
    by contradiction
    
  </span>
</span>
<span class="proof__expand"><a>[expand]</a></span>

Suppose there exists a feasible direction $\b{d} $ at $\b{x}^* $ such that $\b{d}^T \nabla f(\b{x}^*) = 0 $ and $\transpose{ \b{d}} \b{F}(\b{x}^*) \b{d} < 0 $.

Let $\b{x}(\alpha) = \b{x}^* + \alpha \b{d}$ and define the composite function $\phi(\alpha) = f(\b{x}^* + \alpha \b{d}) = f(\b{x}(\alpha))$.

Then, by Taylor’s theorem

$$\phi(\alpha) = \phi(0) + \phi^{ \prime \prime } (0) \frac{ \alpha^2 }{ 2 } + o(\alpha^2)$$

where by assumption $\phi'(0) = \transpose{ \b{d}} \nabla f(\b{x}) = 0$, and $\phi^{ \prime \prime } (0) = \transpose{ \b{d}} \b{F}(\b{x}^*) \b{d} < 0$.

For sufficiently small $\alpha$,

$$\phi(\alpha) - \phi (0) = \phi^{ \prime \prime }(0) \frac{ \alpha^2 }{ 2 } + o(\alpha^2) < 0 $$

that is,

$$f(x^* + \alpha \b{d}) < f(x^*) $$

which contradicts the assumption that $x^* $ is a local minimizer. Thus,

$$\phi^{ \prime \prime } (0) = \transpose{ \b{d}} \b{F}(\b{x} ^*) \b{d} \geq 0$$

Corollary : Interior Case of SONC

Let $x^* $ be an interior point of $\Omega \subseteq \R^n $. If $\b{x}^* $ is a local minimizer of $f: \Omega \to \R, f \in \mathcal{C}^2 $, then

$$\nabla f(\b{x}^*) = \b{0} $$

and $\b{F}(\b{x}^*) $ is positive semidefinite ($F(\b{x}^*) \geq 0$); that is, for all $\b{d} \in \R^n $

$$\transpose{ \b{d}} \b{F}(\b{x}^*) \b{d} \geq 0 $$

Theorem : Second-Order Sufficient Condition (SOSC), Interior Case

Let $f\in\mathcal{C}^2$ be defined on a region where $\b{x}^*$ is an interior point. Suppose that

  1. $\nabla f(\b{x}^*) = \b{0}$; and
  2. $\b{F}(\b{x}^*) > 0$.

Then $\b{x}^*$ is a strict local minimizer of $f$.