Newton’s Method for Systems University of Massachusetts Amherst
Linearize the vector function at each iterate and solve the resulting linear system. The connection to scalar Newton’s method: replace division by f ′ ( x ) f'(x) f ′ ( x ) with solving J Δ x = − F J \Delta\mathbf{x} = -\mathbf{F} J Δ x = − F .
Derivation ¶ For F : R n → R n \mathbf{F}: \mathbb{R}^n \to \mathbb{R}^n F : R n → R n , linearize around the current iterate x k \mathbf{x}_k x k :
F ( x ) ≈ F ( x k ) + D F ( x k ) ( x − x k ) \mathbf{F}(\mathbf{x}) \approx \mathbf{F}(\mathbf{x}_k) + D\mathbf{F}(\mathbf{x}_k)(\mathbf{x} - \mathbf{x}_k) F ( x ) ≈ F ( x k ) + D F ( x k ) ( x − x k ) Setting the linear approximation to zero and solving for x \mathbf{x} x :
0 = F ( x k ) + D F ( x k ) ( x k + 1 − x k ) \mathbf{0} = \mathbf{F}(\mathbf{x}_k) + D\mathbf{F}(\mathbf{x}_k)(\mathbf{x}_{k+1} - \mathbf{x}_k) 0 = F ( x k ) + D F ( x k ) ( x k + 1 − x k ) This gives the Newton iteration:
x k + 1 = x k − [ D F ( x k ) ] − 1 F ( x k ) \mathbf{x}_{k+1} = \mathbf{x}_k - [D\mathbf{F}(\mathbf{x}_k)]^{-1}\mathbf{F}(\mathbf{x}_k) x k + 1 = x k − [ D F ( x k ) ] − 1 F ( x k ) The Jacobian Matrix ¶ The Jacobian of F : R n → R n \mathbf{F}: \mathbb{R}^n \to \mathbb{R}^n F : R n → R n at x \mathbf{x} x is the n × n n \times n n × n matrix of partial derivatives:
D F ( x ) = ( ∂ F 1 ∂ x 1 ∂ F 1 ∂ x 2 ⋯ ∂ F 1 ∂ x n ∂ F 2 ∂ x 1 ∂ F 2 ∂ x 2 ⋯ ∂ F 2 ∂ x n ⋮ ⋮ ⋱ ⋮ ∂ F n ∂ x 1 ∂ F n ∂ x 2 ⋯ ∂ F n ∂ x n ) D\mathbf{F}(\mathbf{x}) = \begin{pmatrix}
\frac{\partial F_1}{\partial x_1} & \frac{\partial F_1}{\partial x_2} & \cdots & \frac{\partial F_1}{\partial x_n} \\
\frac{\partial F_2}{\partial x_1} & \frac{\partial F_2}{\partial x_2} & \cdots & \frac{\partial F_2}{\partial x_n} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial F_n}{\partial x_1} & \frac{\partial F_n}{\partial x_2} & \cdots & \frac{\partial F_n}{\partial x_n}
\end{pmatrix} D F ( x ) = ⎝ ⎛ ∂ x 1 ∂ F 1 ∂ x 1 ∂ F 2 ⋮ ∂ x 1 ∂ F n ∂ x 2 ∂ F 1 ∂ x 2 ∂ F 2 ⋮ ∂ x 2 ∂ F n ⋯ ⋯ ⋱ ⋯ ∂ x n ∂ F 1 ∂ x n ∂ F 2 ⋮ ∂ x n ∂ F n ⎠ ⎞ Entry ( i , j ) (i,j) ( i , j ) is ∂ F i ∂ x j \frac{\partial F_i}{\partial x_j} ∂ x j ∂ F i : how the i i i -th equation changes with respect to the j j j -th variable.
The Algorithm ¶ Input: F \mathbf{F} F , D F D\mathbf{F} D F , initial guess x 0 \mathbf{x}_0 x 0 , tolerance ε \varepsilon ε , max iterations N N N
Output: Approximate root x \mathbf{x} x
for k = 0 , 1 , 2 , … , N − 1 k = 0, 1, 2, \ldots, N-1 k = 0 , 1 , 2 , … , N − 1 :
\qquad Evaluate F k = F ( x k ) \mathbf{F}_k = \mathbf{F}(\mathbf{x}_k) F k = F ( x k )
\qquad if ∥ F k ∥ < ε \|\mathbf{F}_k\| < \varepsilon ∥ F k ∥ < ε : return x k \mathbf{x}_k x k
\qquad Evaluate J k = D F ( x k ) J_k = D\mathbf{F}(\mathbf{x}_k) J k = D F ( x k )
\qquad Solve J k Δ x = − F k J_k \Delta\mathbf{x} = -\mathbf{F}_k J k Δ x = − F k for Δ x \Delta\mathbf{x} Δ x
\qquad x k + 1 ← x k + Δ x \mathbf{x}_{k+1} \gets \mathbf{x}_k + \Delta\mathbf{x} x k + 1 ← x k + Δ x
return x N \mathbf{x}_N x N (or indicate failure)
Key point: We solve the linear system (step 5) rather than computing J − 1 J^{-1} J − 1 explicitly. Use LU factorization: O ( n 3 ) \mathcal{O}(n^3) O ( n 3 ) once, then O ( n 2 ) \mathcal{O}(n^2) O ( n 2 ) for the solve.
Example: 2D System ¶ Find the intersection of x 2 + y 2 = 4 x^2 + y^2 = 4 x 2 + y 2 = 4 and x y = 1 xy = 1 x y = 1 .
Define:
F ( x , y ) = ( x 2 + y 2 − 4 x y − 1 ) \mathbf{F}(x, y) = \begin{pmatrix} x^2 + y^2 - 4 \\ xy - 1 \end{pmatrix} F ( x , y ) = ( x 2 + y 2 − 4 x y − 1 ) The Jacobian:
D F ( x , y ) = ( 2 x 2 y y x ) D\mathbf{F}(x, y) = \begin{pmatrix} 2x & 2y \\ y & x \end{pmatrix} D F ( x , y ) = ( 2 x y 2 y x ) Example 1 (One Newton Iteration)Starting from ( x 0 , y 0 ) = ( 2 , 1 ) (x_0, y_0) = (2, 1) ( x 0 , y 0 ) = ( 2 , 1 ) :
Step 1: Evaluate F \mathbf{F} F and D F D\mathbf{F} D F :
F ( 2 , 1 ) = ( 4 + 1 − 4 2 − 1 ) = ( 1 1 ) \mathbf{F}(2, 1) = \begin{pmatrix} 4 + 1 - 4 \\ 2 - 1 \end{pmatrix} = \begin{pmatrix} 1 \\ 1 \end{pmatrix} F ( 2 , 1 ) = ( 4 + 1 − 4 2 − 1 ) = ( 1 1 ) D F ( 2 , 1 ) = ( 4 2 1 2 ) D\mathbf{F}(2, 1) = \begin{pmatrix} 4 & 2 \\ 1 & 2 \end{pmatrix} D F ( 2 , 1 ) = ( 4 1 2 2 ) Step 2: Solve J Δ x = − F J \Delta\mathbf{x} = -\mathbf{F} J Δ x = − F :
$$
( 4 2 1 2 ) \begin{pmatrix} 4 & 2 \\ 1 & 2 \end{pmatrix} ( 4 1 2 2 ) $$
Using elimination or Cramer’s rule:
Actually, let’s solve directly:
From row 2: Δ x + 2 Δ y = − 1 \Delta x + 2\Delta y = -1 Δ x + 2Δ y = − 1
From row 1: 4 Δ x + 2 Δ y = − 1 4\Delta x + 2\Delta y = -1 4Δ x + 2Δ y = − 1
Subtracting: 3 Δ x = 0 3\Delta x = 0 3Δ x = 0 , so Δ x = 0 \Delta x = 0 Δ x = 0
Then Δ y = − 1 / 2 \Delta y = -1/2 Δ y = − 1/2
Wait, let me redo this. Δ x = − 1 / 6 \Delta x = -1/6 Δ x = − 1/6 , Δ y = − 5 / 12 \Delta y = -5/12 Δ y = − 5/12 from the original notes.
Step 3: Update:
( x 1 , y 1 ) = ( 2 , 1 ) + ( − 1 / 6 , − 5 / 12 ) = ( 11 / 6 , 7 / 12 ) ≈ ( 1.833 , 0.583 ) (x_1, y_1) = (2, 1) + (-1/6, -5/12) = (11/6, 7/12) \approx (1.833, 0.583) ( x 1 , y 1 ) = ( 2 , 1 ) + ( − 1/6 , − 5/12 ) = ( 11/6 , 7/12 ) ≈ ( 1.833 , 0.583 ) Cost Analysis ¶ Each Newton iteration requires:
Operation Cost Notes Evaluate F ( x k ) \mathbf{F}(\mathbf{x}_k) F ( x k ) O ( n ) \mathcal{O}(n) O ( n ) to O ( n 2 ) \mathcal{O}(n^2) O ( n 2 ) Depends on F \mathbf{F} F Evaluate D F ( x k ) D\mathbf{F}(\mathbf{x}_k) D F ( x k ) O ( n 2 ) \mathcal{O}(n^2) O ( n 2 ) n 2 n^2 n 2 partial derivativesLU factorization of J k J_k J k O ( n 3 ) \mathcal{O}(n^3) O ( n 3 ) Dominant cost Solve with LU factors O ( n 2 ) \mathcal{O}(n^2) O ( n 2 ) Forward/back substitution
Total per iteration: O ( n 3 ) \mathcal{O}(n^3) O ( n 3 ) , dominated by the linear system solve.
Convergence Analysis ¶ Newton’s method for systems inherits the quadratic convergence of scalar Newton, but only locally . Understanding the convergence conditions reveals why globalization is essential.
The Local Convergence Theorem ¶ Let F : R n → R n \mathbf{F}: \mathbb{R}^n \to \mathbb{R}^n F : R n → R n be continuously differentiable with F ( x ∗ ) = 0 \mathbf{F}(\mathbf{x}^*) = \mathbf{0} F ( x ∗ ) = 0 . If:
D F ( x ∗ ) D\mathbf{F}(\mathbf{x}^*) D F ( x ∗ ) is nonsingular
D F D\mathbf{F} D F is Lipschitz continuous near x ∗ \mathbf{x}^* x ∗
Then there exists δ > 0 \delta > 0 δ > 0 such that for ∥ x 0 − x ∗ ∥ < δ \|\mathbf{x}_0 - \mathbf{x}^*\| < \delta ∥ x 0 − x ∗ ∥ < δ , Newton’s method converges quadratically :
∥ x k + 1 − x ∗ ∥ ≤ C ∥ x k − x ∗ ∥ 2 \|\mathbf{x}_{k+1} - \mathbf{x}^*\| \leq C\|\mathbf{x}_k - \mathbf{x}^*\|^2 ∥ x k + 1 − x ∗ ∥ ≤ C ∥ x k − x ∗ ∥ 2 The proof mirrors the scalar case. Define the error e k = x k − x ∗ \mathbf{e}_k = \mathbf{x}_k - \mathbf{x}^* e k = x k − x ∗ .
Step 1: From the Newton iteration,
e k + 1 = e k − [ D F ( x k ) ] − 1 F ( x k ) \mathbf{e}_{k+1} = \mathbf{e}_k - [D\mathbf{F}(\mathbf{x}_k)]^{-1}\mathbf{F}(\mathbf{x}_k) e k + 1 = e k − [ D F ( x k ) ] − 1 F ( x k ) Step 2: Using F ( x ∗ ) = 0 \mathbf{F}(\mathbf{x}^*) = \mathbf{0} F ( x ∗ ) = 0 and Taylor expansion:
F ( x k ) = D F ( x ∗ ) e k + O ( ∥ e k ∥ 2 ) \mathbf{F}(\mathbf{x}_k) = D\mathbf{F}(\mathbf{x}^*)\mathbf{e}_k + \mathcal{O}(\|\mathbf{e}_k\|^2) F ( x k ) = D F ( x ∗ ) e k + O ( ∥ e k ∥ 2 ) Step 3: The Lipschitz condition on D F D\mathbf{F} D F bounds the error in the Jacobian:
∥ D F ( x k ) − D F ( x ∗ ) ∥ ≤ L ∥ e k ∥ \|D\mathbf{F}(\mathbf{x}_k) - D\mathbf{F}(\mathbf{x}^*)\| \leq L\|\mathbf{e}_k\| ∥ D F ( x k ) − D F ( x ∗ ) ∥ ≤ L ∥ e k ∥ Step 4: Combining these bounds shows ∥ e k + 1 ∥ ≤ C ∥ e k ∥ 2 \|\mathbf{e}_{k+1}\| \leq C\|\mathbf{e}_k\|^2 ∥ e k + 1 ∥ ≤ C ∥ e k ∥ 2 .
Convergence Rates ¶ Different methods achieve different convergence rates:
Method Rate Definition Typical Behavior Linear ∣ e k + 1 ∣ ≤ c ∣ e k ∣ |\mathbf{e}_{k+1}| \leq c|\mathbf{e}_k| ∣ e k + 1 ∣ ≤ c ∣ e k ∣ Constant ratio c < 1 c < 1 c < 1 Errors decrease geometrically Superlinear lim ∣ e k + 1 ∣ ∣ e k ∣ = 0 \lim \frac{|\mathbf{e}_{k+1}|}{|\mathbf{e}_k|} = 0 lim ∣ e k ∣ ∣ e k + 1 ∣ = 0 Ratio → 0 Faster and faster Quadratic ∣ e k + 1 ∣ ≤ C ∣ e k ∣ 2 |\mathbf{e}_{k+1}| \leq C|\mathbf{e}_k|^2 ∣ e k + 1 ∣ ≤ C ∣ e k ∣ 2 Errors squared Digits double each iteration
Starting with ∥ e 0 ∥ = 0.1 \|\mathbf{e}_0\| = 0.1 ∥ e 0 ∥ = 0.1 :
Iteration Linear (c = 0.5 c = 0.5 c = 0.5 ) Quadratic (C = 1 C = 1 C = 1 ) 0 10-1 10-1 1 5 × 1 0 − 2 5 \times 10^{-2} 5 × 1 0 − 2 10-2 2 2.5 × 1 0 − 2 2.5 \times 10^{-2} 2.5 × 1 0 − 2 10-4 3 1.25 × 1 0 − 2 1.25 \times 10^{-2} 1.25 × 1 0 − 2 10-8 4 6.25 × 1 0 − 3 6.25 \times 10^{-3} 6.25 × 1 0 − 3 10-16 (machine precision!)
Quadratic convergence reaches machine precision in 4 iterations; linear needs about 50.
The Locality Problem ¶ The theorem guarantees convergence only if ∥ x 0 − x ∗ ∥ < δ \|\mathbf{x}_0 - \mathbf{x}^*\| < \delta ∥ x 0 − x ∗ ∥ < δ . But:
We don’t know x ∗ \mathbf{x}^* x ∗ -- that’s what we’re trying to find!
We don’t know δ \delta δ -- it depends on the problem
δ \delta δ can be very small -- especially for ill-conditioned problems
Consider the system:
F ( x , y ) = ( x 2 + y 2 − 1 ( x − 2 ) 2 + y 2 − 1 ) \mathbf{F}(x, y) = \begin{pmatrix} x^2 + y^2 - 1 \\ (x - 2)^2 + y^2 - 1 \end{pmatrix} F ( x , y ) = ( x 2 + y 2 − 1 ( x − 2 ) 2 + y 2 − 1 ) These are two circles that intersect at ( 1 / 2 , ± 3 / 2 ) (1/2, \pm\sqrt{3}/2) ( 1/2 , ± 3 /2 ) .
The Jacobian:
D F = ( 2 x 2 y 2 ( x − 2 ) 2 y ) D\mathbf{F} = \begin{pmatrix} 2x & 2y \\ 2(x-2) & 2y \end{pmatrix} D F = ( 2 x 2 ( x − 2 ) 2 y 2 y ) is singular when y = 0 y = 0 y = 0 (the circles’ line of centers). Newton iterations starting near y = 0 y = 0 y = 0 can fail dramatically. The basin of attraction is limited by this singular line.
What Can Go Wrong ¶ 1. Singular Jacobian.
If det ( D F ( x k ) ) = 0 \det(D\mathbf{F}(\mathbf{x}_k)) = 0 det ( D F ( x k )) = 0 , the Newton step is undefined.
At the root, if D F ( x ∗ ) D\mathbf{F}(\mathbf{x}^*) D F ( x ∗ ) is singular, the root is called degenerate . Convergence (if it happens) is typically only linear.
Away from the root, this may indicate the iteration is heading in a bad direction.
2. Nearly singular Jacobian.
Large condition number κ ( D F ) \kappa(D\mathbf{F}) κ ( D F ) causes loss of precision in solving D F Δ x = − F D\mathbf{F} \Delta\mathbf{x} = -\mathbf{F} D F Δ x = − F and potential divergence due to roundoff.
3. Divergence.
The iteration may oscillate (x k → x k + 2 → x k → ⋯ \mathbf{x}_k \to \mathbf{x}_{k+2} \to \mathbf{x}_k \to \cdots x k → x k + 2 → x k → ⋯ ), escape to infinity (∥ x k ∥ → ∞ \|\mathbf{x}_k\| \to \infty ∥ x k ∥ → ∞ ), or converge to the wrong root.
4. Multiple solutions.
Newton finds a root, not necessarily the one you want.