$$\text{minimize} \sum_{ i=1 }^{ m } (r_i(\b{x}))^2 $$
where $r_i : \R^n \to \R, i = 1, …, m $, are given functions.
This particular problem is called a nonlinear least-squares problem.
The special case where the $r_i $ are linear is discussed in Section 12.1.
Suppose we are given $m $ measurements of a process at m points in time. Let $t_1, _2, …, _{ m } $ denote the measurement times, and $y_1, _2, …, _{ m } $ the mesurement values. Note that $t_1 = 0 $ while $t_{21} = 10 $. We wish to fit a sinusoid to the measurement data. The equation of the sinusoid is $y= A \sin (\omega t + \phi) $ with appropriate choices of the parameters $A, \omega, $ and $\phi $.
To formulate the data-fitting problem, we construct the objective function
$$\sum_{ i=1 }^{ m } (y_i - A \sin (\omega t_i + \phi))^2 $$
representing the sum of the squared errors between the measurement values and the function values at the corresponding points in time. Let $\b{x} = \transpose{ [A, \omega, \phi] } $ represent the vector of decision variables. We therefore obtain a nonlinear least-squares problem with
$$r_i(\b{x}) = y_i - A sin(\omega t_i + \phi) $$
Defining $\b{r} = \transpose{ [r_1, …, r_m] } $, we write the objective function as $f(\b{x}) = \transpose{ \b{r}(\b{x})} \b{r}(\b{x})$. To apply Newton’s method, we need to compute the gradient and the Hessian of $f $. The $j$th component of $\nabla f(\b{x}) $ is
$$\begin{align*} (\nabla f(\b{x}))_j &= \pder{ f }{ x_j }(\b{x}) \br &= 2 \sum_{ i=1 }^{ m } r_i(\b{x}) \pder{ r_i }{ x_j }(\b{x}) \end{align*}$$
Denote the Jacobian matrix of $\b{r} $ by
$$\b{J}(\b{x}) = {\Mee{ \pder{ r_1 }{ x_1 }(\b{x})}{ … }{ \pder{ r_1 }{ x_n } (\b{x})}{ \vdots }{ }{ \vdots }{ \pder{ r_m }{ x_1 }(\b{x})}{ … }{ \pder{ r_m }{ x_n } (\b{x})}} $$
Then, the gradient of $f $ can be represented as
$$\nabla f(\b{x}) = 2 \transpose{ \b{J}(\b{x})} \b{r}(\b{x}) $$
Next, we compute the Hessian matrix of $\b{f} $. The $(k, j)$th component of the Hessian is given by
$$\begin{align*} \pderws{f}{x_k}{x_j}(\b{x}) &= \pder{ }{ x_k }\pare{\pder{ f }{ x_j }(\b{x})} \br &= \pder{ }{ x_k }\pare{2 \sum_{ i=1 }^{ m } r_i(\b{x}) \pder{ r_i }{ x_j }(\b{x})} \br &= 2\sum_{ i=1 }^{ m }\pare{\pder{ r_i }{ x_k }(\b{x}) \pder{ r_i }{ x_j }(\b{x}) + r_i(\b{x}) \pderws{r_i}{x_k}{x_j}(\b{x})} \end{align*}$$
Letting $\b{S}(\b{x}) $ be the matrix whose $(k, j) $th component is
$$r_i(\b{x}) \pderws{r _i}{x_k}{x_j}(\b{x}) $$
we write the Hessian matrix as
$$\b{F}(\b{x}) = 2(\transpose{\b{J}(\b{x})} \b{J}(\b{x}) + \b{S}(\b{x})) $$
Therefore, Newton’s method applied to the nonlinear least-squares problem is given by
$$\b{x}^{(k+1)} = \b{x}^{(k)} - \inv{(\transpose{ \b{J}(\b{x})}\b{J}(\b{x}) + \b{S}(\b{x}))} \transpose{ \b{J}(\b{x})}\b{r}(\b{x})$$
In some applications, the matrix $\b{S}(\b{x}) $ involving the second derivatives of the function $\b{r} $ can be ignored because its components are negligibly small. In this case, the above Newton’s algorithm reduces to what is commonly called the Gauss-Newton method does not require calculation of the second derivatives of $\b{r}$.