Newton's Method for Nonlinear Least-Squares - Jim Zenn

$\newcommand{\br}{\\}$ $\newcommand{\R}{\mathbb{R}}$ $\newcommand{\Q}{\mathbb{Q}}$ $\newcommand{\Z}{\mathbb{Z}}$ $\newcommand{\N}{\mathbb{N}}$ $\newcommand{\C}{\mathbb{C}}$ $\newcommand{\P}{\mathbb{P}}$ $\newcommand{\F}{\mathbb{F}}$ $\newcommand{\L}{\mathcal{L}}$ $\newcommand{\spa}[1]{\text{span}(#1)}$ $\newcommand{\dist}[1]{\text{dist}(#1)}$ $\newcommand{\max}[1]{\text{max}(#1)}$ $\newcommand{\min}[1]{\text{min}(#1)}$ $\newcommand{\supr}[1]{\text{sup}(#1)}$ $\newcommand{\infi}[1]{\text{inf}(#1)}$ $\newcommand{\argmax}[1]{\underset{#1}{\text{argmax }}}$ $\newcommand{\argmin}[1]{\underset{#1}{\text{argmin }}}$ $\newcommand{\set}[1]{\left\{#1\right\}}$ $\newcommand{\emptyset}{\varnothing}$ $\newcommand{\otherwise}{\text{ otherwise }}$ $\newcommand{\if}{\text{ if }}$ $\newcommand{\proj}{\text{proj}}$ $\newcommand{\union}{\cup}$ $\newcommand{\intercept}{\cap}$ $\newcommand{\abs}[1]{\left| #1 \right|}$ $\newcommand{\norm}[1]{\left\lVert#1\right\rVert}$ $\newcommand{\pare}[1]{\left(#1\right)}$ $\newcommand{\t}[1]{\text{ #1 }}$ $\newcommand{\head}{\text H}$ $\newcommand{\tail}{\text T}$ $\newcommand{\d}{\text d}$ $\newcommand{\limu}[2]{\underset{#1 \to #2}\lim}$ $\newcommand{\der}[2]{\frac{\d #1}{\d #2}}$ $\newcommand{\derw}[2]{\frac{\d #1^2}{\d^2 #2}}$ $\newcommand{\pder}[2]{\frac{\partial #1}{\partial #2}}$ $\newcommand{\pderw}[2]{\frac{\partial^2 #1}{\partial #2^2}}$ $\newcommand{\pderws}[3]{\frac{\partial^2 #1}{\partial #2 \partial #3}}$ $\newcommand{\inv}[1]{{#1}^{-1}}$ $\newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $\newcommand{\nullity}[1]{\text{nullity}(#1)}$ $\newcommand{\rank}[1]{\text{rank }#1}$ $\newcommand{\var}[1]{\text{var}(#1)}$ $\newcommand{\tr}[1]{\text{tr}(#1)}$ $\newcommand{\oto}{\text{ one-to-one }}$ $\newcommand{\ot}{\text{ onto }}$ $\newcommand{\ceil}[1]{\lceil#1\rceil}$ $\newcommand{\floor}[1]{\lfloor#1\rfloor}$ $\newcommand{\Re}[1]{\text{Re}(#1)}$ $\newcommand{\Im}[1]{\text{Im}(#1)}$ $\newcommand{\dom}[1]{\text{dom}(#1)}$ $\newcommand{\fnext}[1]{\overset{\sim}{#1}}$ $\newcommand{\transpose}[1]{{#1}^{\text{T}}}$ $\newcommand{\b}[1]{\boldsymbol{#1}}$ $\newcommand{\None}[1]{}$ $\newcommand{\Vcw}[2]{\begin{bmatrix} #1 \br #2 \end{bmatrix}}$ $\newcommand{\Vce}[3]{\begin{bmatrix} #1 \br #2 \br #3 \end{bmatrix}}$ $\newcommand{\Vcr}[4]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \end{bmatrix}}$ $\newcommand{\Vct}[5]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \br #5 \end{bmatrix}}$ $\newcommand{\Vcy}[6]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \br #5 \br #6 \end{bmatrix}}$ $\newcommand{\Vcu}[7]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \br #5 \br #6 \br #7 \end{bmatrix}}$ $\newcommand{\vcw}[2]{\begin{matrix} #1 \br #2 \end{matrix}}$ $\newcommand{\vce}[3]{\begin{matrix} #1 \br #2 \br #3 \end{matrix}}$ $\newcommand{\vcr}[4]{\begin{matrix} #1 \br #2 \br #3 \br #4 \end{matrix}}$ $\newcommand{\vct}[5]{\begin{matrix} #1 \br #2 \br #3 \br #4 \br #5 \end{matrix}}$ $\newcommand{\vcy}[6]{\begin{matrix} #1 \br #2 \br #3 \br #4 \br #5 \br #6 \end{matrix}}$ $\newcommand{\vcu}[7]{\begin{matrix} #1 \br #2 \br #3 \br #4 \br #5 \br #6 \br #7 \end{matrix}}$ $\newcommand{\Mqw}[2]{\begin{bmatrix} #1 & #2 \end{bmatrix}}$ $\newcommand{\Mqe}[3]{\begin{bmatrix} #1 & #2 & #3 \end{bmatrix}}$ $\newcommand{\Mqr}[4]{\begin{bmatrix} #1 & #2 & #3 & #4 \end{bmatrix}}$ $\newcommand{\Mqt}[5]{\begin{bmatrix} #1 & #2 & #3 & #4 & #5 \end{bmatrix}}$ $\newcommand{\Mwq}[2]{\begin{bmatrix} #1 \br #2 \end{bmatrix}}$ $\newcommand{\Meq}[3]{\begin{bmatrix} #1 \br #2 \br #3 \end{bmatrix}}$ $\newcommand{\Mrq}[4]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \end{bmatrix}}$ $\newcommand{\Mtq}[5]{\begin{bmatrix} #1 \br #2 \br #3 \br #4 \br #5 \end{bmatrix}}$ $\newcommand{\Mqw}[2]{\begin{bmatrix} #1 & #2 \end{bmatrix}}$ $\newcommand{\Mwq}[2]{\begin{bmatrix} #1 \br #2 \end{bmatrix}}$ $\newcommand{\Mww}[4]{\begin{bmatrix} #1 & #2 \br #3 & #4 \end{bmatrix}}$ $\newcommand{\Mqe}[3]{\begin{bmatrix} #1 & #2 & #3 \end{bmatrix}}$ $\newcommand{\Meq}[3]{\begin{bmatrix} #1 \br #2 \br #3 \end{bmatrix}}$ $\newcommand{\Mwe}[6]{\begin{bmatrix} #1 & #2 & #3\br #4 & #5 & #6 \end{bmatrix}}$ $\newcommand{\Mew}[6]{\begin{bmatrix} #1 & #2 \br #3 & #4 \br #5 & #6 \end{bmatrix}}$ $\newcommand{\Mee}[9]{\begin{bmatrix} #1 & #2 & #3 \br #4 & #5 & #6 \br #7 & #8 & #9 \end{bmatrix}}$

Definition: Nonlinear Least-Squares Problem

$$\text{minimize} \sum_{ i=1 }^{ m } (r_i(\b{x}))^2 $$

where $r_i : \R^n \to \R, i = 1, …, m $, are given functions.

This particular problem is called a nonlinear least-squares problem.

Note: Linear Least-Squares Problem

The special case where the $r_i $ are linear is discussed in Section 12.1.

Example

Suppose we are given $m $ measurements of a process at m points in time. Let $t_1, _2, …, _{ m } $ denote the measurement times, and $y_1, _2, …, _{ m } $ the mesurement values. Note that $t_1 = 0 $ while $t_{21} = 10 $. We wish to fit a sinusoid to the measurement data. The equation of the sinusoid is $y= A \sin (\omega t + \phi) $ with appropriate choices of the parameters $A, \omega, $ and $\phi $.

To formulate the data-fitting problem, we construct the objective function

$$\sum_{ i=1 }^{ m } (y_i - A \sin (\omega t_i + \phi))^2 $$

representing the sum of the squared errors between the measurement values and the function values at the corresponding points in time. Let $\b{x} = \transpose{ [A, \omega, \phi] } $ represent the vector of decision variables. We therefore obtain a nonlinear least-squares problem with

$$r_i(\b{x}) = y_i - A sin(\omega t_i + \phi) $$

Defining $\b{r} = \transpose{ [r_1, …, r_m] } $, we write the objective function as $f(\b{x}) = \transpose{ \b{r}(\b{x})} \b{r}(\b{x})$. To apply Newton’s method, we need to compute the gradient and the Hessian of $f $. The $j$th component of $\nabla f(\b{x}) $ is

$$\begin{align*} (\nabla f(\b{x}))_j &= \pder{ f }{ x_j }(\b{x}) \br &= 2 \sum_{ i=1 }^{ m } r_i(\b{x}) \pder{ r_i }{ x_j }(\b{x}) \end{align*}$$

Denote the Jacobian matrix of $\b{r} $ by

$$\b{J}(\b{x}) = {\Mee{ \pder{ r_1 }{ x_1 }(\b{x})}{ … }{ \pder{ r_1 }{ x_n } (\b{x})}{ \vdots }{ }{ \vdots }{ \pder{ r_m }{ x_1 }(\b{x})}{ … }{ \pder{ r_m }{ x_n } (\b{x})}} $$

Then, the gradient of $f $ can be represented as

$$\nabla f(\b{x}) = 2 \transpose{ \b{J}(\b{x})} \b{r}(\b{x}) $$

Next, we compute the Hessian matrix of $\b{f} $. The $(k, j)$th component of the Hessian is given by

$$\begin{align*} \pderws{f}{x_k}{x_j}(\b{x}) &= \pder{ }{ x_k }\pare{\pder{ f }{ x_j }(\b{x})} \br &= \pder{ }{ x_k }\pare{2 \sum_{ i=1 }^{ m } r_i(\b{x}) \pder{ r_i }{ x_j }(\b{x})} \br &= 2\sum_{ i=1 }^{ m }\pare{\pder{ r_i }{ x_k }(\b{x}) \pder{ r_i }{ x_j }(\b{x}) + r_i(\b{x}) \pderws{r_i}{x_k}{x_j}(\b{x})} \end{align*}$$

Letting $\b{S}(\b{x}) $ be the matrix whose $(k, j) $th component is

$$r_i(\b{x}) \pderws{r _i}{x_k}{x_j}(\b{x}) $$

we write the Hessian matrix as

$$\b{F}(\b{x}) = 2(\transpose{\b{J}(\b{x})} \b{J}(\b{x}) + \b{S}(\b{x})) $$

Therefore, Newton’s method applied to the nonlinear least-squares problem is given by

$$\b{x}^{(k+1)} = \b{x}^{(k)} - \inv{(\transpose{ \b{J}(\b{x})}\b{J}(\b{x}) + \b{S}(\b{x}))} \transpose{ \b{J}(\b{x})}\b{r}(\b{x})$$

In some applications, the matrix $\b{S}(\b{x}) $ involving the second derivatives of the function $\b{r} $ can be ignored because its components are negligibly small. In this case, the above Newton’s algorithm reduces to what is commonly called the Gauss-Newton method does not require calculation of the second derivatives of $\b{r}$.