Newton’s Method for Systems - Introduction to Scientific Computing

Big Idea

Linearize the vector function at each iterate and solve the resulting linear system. The connection to scalar Newton’s method: replace division by $f'(x)$ with solving $J \Delta\mathbf{x} = -\mathbf{F}$ .

Derivation¶

For $\mathbf{F}: \mathbb{R}^n \to \mathbb{R}^n$ , linearize around the current iterate $\mathbf{x}_k$ :

\mathbf{F}(\mathbf{x}) \approx \mathbf{F}(\mathbf{x}_k) + D\mathbf{F}(\mathbf{x}_k)(\mathbf{x} - \mathbf{x}_k)

(1)

Setting the linear approximation to zero and solving for $\mathbf{x}$ :

\mathbf{0} = \mathbf{F}(\mathbf{x}_k) + D\mathbf{F}(\mathbf{x}_k)(\mathbf{x}_{k+1} - \mathbf{x}_k)

(2)

This gives the Newton iteration:

\mathbf{x}_{k+1} = \mathbf{x}_k - [D\mathbf{F}(\mathbf{x}_k)]^{-1}\mathbf{F}(\mathbf{x}_k)

(3)

The Jacobian Matrix¶

Definition 1 (Jacobian)

The Jacobian of $\mathbf{F}: \mathbb{R}^n \to \mathbb{R}^n$ at $\mathbf{x}$ is the $n \times n$ matrix of partial derivatives:

D\mathbf{F}(\mathbf{x}) = \begin{pmatrix} \frac{\partial F_1}{\partial x_1} & \frac{\partial F_1}{\partial x_2} & \cdots & \frac{\partial F_1}{\partial x_n} \\ \frac{\partial F_2}{\partial x_1} & \frac{\partial F_2}{\partial x_2} & \cdots & \frac{\partial F_2}{\partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial F_n}{\partial x_1} & \frac{\partial F_n}{\partial x_2} & \cdots & \frac{\partial F_n}{\partial x_n} \end{pmatrix}

(4)

Entry $(i,j)$ is $\frac{\partial F_i}{\partial x_j}$ : how the $i$ -th equation changes with respect to the $j$ -th variable.

The Algorithm¶

Algorithm 1 (Newton’s Method for Systems)

Input: $\mathbf{F}$ , $D\mathbf{F}$ , initial guess $\mathbf{x}_0$ , tolerance $\varepsilon$ , max iterations $N$

Output: Approximate root $\mathbf{x}$

for $k = 0, 1, 2, \ldots, N-1$ :
$\qquad$ Evaluate $\mathbf{F}_k = \mathbf{F}(\mathbf{x}_k)$
$\qquad$ if $\|\mathbf{F}_k\| < \varepsilon$ : return $\mathbf{x}_k$
$\qquad$ Evaluate $J_k = D\mathbf{F}(\mathbf{x}_k)$
$\qquad$ Solve $J_k \Delta\mathbf{x} = -\mathbf{F}_k$ for $\Delta\mathbf{x}$
$\qquad$ $\mathbf{x}_{k+1} \gets \mathbf{x}_k + \Delta\mathbf{x}$
return $\mathbf{x}_N$ (or indicate failure)

Key point: We solve the linear system (step 5) rather than computing $J^{-1}$ explicitly. Use LU factorization: $\mathcal{O}(n^3)$ once, then $\mathcal{O}(n^2)$ for the solve.

Example: 2D System¶

Find the intersection of $x^2 + y^2 = 4$ and $xy = 1$ .

Define:

\mathbf{F}(x, y) = \begin{pmatrix} x^2 + y^2 - 4 \\ xy - 1 \end{pmatrix}

(5)

The Jacobian:

D\mathbf{F}(x, y) = \begin{pmatrix} 2x & 2y \\ y & x \end{pmatrix}

(6)

Example 1 (One Newton Iteration)

Cost Analysis¶

Each Newton iteration requires:

Operation	Cost	Notes
Evaluate $\mathbf{F}(\mathbf{x}_k)$	$\mathcal{O}(n)$ to $\mathcal{O}(n^2)$	Depends on $\mathbf{F}$
Evaluate $D\mathbf{F}(\mathbf{x}_k)$	$\mathcal{O}(n^2)$	$n^2$ partial derivatives
LU factorization of $J_k$	$\mathcal{O}(n^3)$	Dominant cost
Solve with LU factors	$\mathcal{O}(n^2)$	Forward/back substitution

Total per iteration: $\mathcal{O}(n^3)$ , dominated by the linear system solve.

Convergence Analysis¶

Newton’s method for systems inherits the quadratic convergence of scalar Newton, but only locally. Understanding the convergence conditions reveals why globalization is essential.

The Local Convergence Theorem¶

Theorem 1 (Newton Convergence for Systems)

Let $\mathbf{F}: \mathbb{R}^n \to \mathbb{R}^n$ be continuously differentiable with $\mathbf{F}(\mathbf{x}^*) = \mathbf{0}$ . If:

$D\mathbf{F}(\mathbf{x}^*)$ is nonsingular
$D\mathbf{F}$ is Lipschitz continuous near $\mathbf{x}^*$

Then there exists $\delta > 0$ such that for $\|\mathbf{x}_0 - \mathbf{x}^*\| < \delta$ , Newton’s method converges quadratically:

\|\mathbf{x}_{k+1} - \mathbf{x}^*\| \leq C\|\mathbf{x}_k - \mathbf{x}^*\|^2

(11)

The proof mirrors the scalar case. Define the error $\mathbf{e}_k = \mathbf{x}_k - \mathbf{x}^*$ .

Step 1: From the Newton iteration,

\mathbf{e}_{k+1} = \mathbf{e}_k - [D\mathbf{F}(\mathbf{x}_k)]^{-1}\mathbf{F}(\mathbf{x}_k)

(12)

Step 2: Using $\mathbf{F}(\mathbf{x}^*) = \mathbf{0}$ and Taylor expansion:

\mathbf{F}(\mathbf{x}_k) = D\mathbf{F}(\mathbf{x}^*)\mathbf{e}_k + \mathcal{O}(\|\mathbf{e}_k\|^2)

(13)

Step 3: The Lipschitz condition on $D\mathbf{F}$ bounds the error in the Jacobian:

\|D\mathbf{F}(\mathbf{x}_k) - D\mathbf{F}(\mathbf{x}^*)\| \leq L\|\mathbf{e}_k\|

(14)

Step 4: Combining these bounds shows $\|\mathbf{e}_{k+1}\| \leq C\|\mathbf{e}_k\|^2$ .

Convergence Rates¶

Different methods achieve different convergence rates:

Method	Rate	Definition	Typical Behavior
Linear	$\|\mathbf{e}_{k+1}\| \leq c\|\mathbf{e}_k\|$	Constant ratio $c < 1$	Errors decrease geometrically
Superlinear	$\lim \frac{\|\mathbf{e}_{k+1}\|}{\|\mathbf{e}_k\|} = 0$	Ratio → 0	Faster and faster
Quadratic	$\|\mathbf{e}_{k+1}\| \leq C\|\mathbf{e}_k\|^2$	Errors squared	Digits double each iteration

Example 2 (Quadratic vs Linear)

Starting with $\|\mathbf{e}_0\| = 0.1$ :

Iteration	Linear ( $c = 0.5$ )	Quadratic ( $C = 1$ )
0	10^-1	10^-1
1	$5 \times 10^{-2}$	10^-2
2	$2.5 \times 10^{-2}$	10^-4
3	$1.25 \times 10^{-2}$	10^-8
4	$6.25 \times 10^{-3}$	10^-16 (machine precision!)

Quadratic convergence reaches machine precision in 4 iterations; linear needs about 50.

The Locality Problem¶

The theorem guarantees convergence only if $\|\mathbf{x}_0 - \mathbf{x}^*\| < \delta$ . But:

We don’t know $\mathbf{x}^*$ -- that’s what we’re trying to find!
We don’t know $\delta$ -- it depends on the problem
$\delta$ can be very small -- especially for ill-conditioned problems

Example 3 (Small Basin of Attraction)

Consider the system:

\mathbf{F}(x, y) = \begin{pmatrix} x^2 + y^2 - 1 \\ (x - 2)^2 + y^2 - 1 \end{pmatrix}

(15)

These are two circles that intersect at $(1/2, \pm\sqrt{3}/2)$ .

The Jacobian:

D\mathbf{F} = \begin{pmatrix} 2x & 2y \\ 2(x-2) & 2y \end{pmatrix}

(16)

is singular when $y = 0$ (the circles’ line of centers). Newton iterations starting near $y = 0$ can fail dramatically. The basin of attraction is limited by this singular line.

What Can Go Wrong¶

1. Singular Jacobian. If $\det(D\mathbf{F}(\mathbf{x}_k)) = 0$ , the Newton step is undefined. At the root, if $D\mathbf{F}(\mathbf{x}^*)$ is singular, the root is called degenerate. Convergence (if it happens) is typically only linear. Away from the root, this may indicate the iteration is heading in a bad direction.

2. Nearly singular Jacobian. Large condition number $\kappa(D\mathbf{F})$ causes loss of precision in solving $D\mathbf{F} \Delta\mathbf{x} = -\mathbf{F}$ and potential divergence due to roundoff.

3. Divergence. The iteration may oscillate ( $\mathbf{x}_k \to \mathbf{x}_{k+2} \to \mathbf{x}_k \to \cdots$ ), escape to infinity ( $\|\mathbf{x}_k\| \to \infty$ ), or converge to the wrong root.

4. Multiple solutions. Newton finds a root, not necessarily the one you want.