Newton’s Method University of Massachusetts Amherst
Derivation ¶ Given f ( x ) = 0 f(x) = 0 f ( x ) = 0 and an initial guess x 0 x_0 x 0 , approximate f f f by its tangent line at x 0 x_0 x 0 :
y = f ( x 0 ) + f ′ ( x 0 ) ( x − x 0 ) y = f(x_0) + f'(x_0)(x - x_0) y = f ( x 0 ) + f ′ ( x 0 ) ( x − x 0 ) Setting y = 0 y = 0 y = 0 and solving for x x x gives the next approximation:
x 1 = x 0 − f ( x 0 ) f ′ ( x 0 ) x_1 = x_0 - \frac{f(x_0)}{f'(x_0)} x 1 = x 0 − f ′ ( x 0 ) f ( x 0 ) Repeating this process:
x n + 1 = x n − f ( x n ) f ′ ( x n ) x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)} x n + 1 = x n − f ′ ( x n ) f ( x n ) Newton’s Method as Fixed Point Iteration ¶ Newton’s method is a fixed point iteration with:
g ( x ) = x − f ( x ) f ′ ( x ) g(x) = x - \frac{f(x)}{f'(x)} g ( x ) = x − f ′ ( x ) f ( x ) If x n → c x_n \to c x n → c , then:
c = c − f ( c ) f ′ ( c ) ⟹ f ( c ) = 0 c = c - \frac{f(c)}{f'(c)} \implies f(c) = 0 c = c − f ′ ( c ) f ( c ) ⟹ f ( c ) = 0 So the fixed point is indeed a root of f f f .
The Algorithm ¶ Input: Functions f f f and f ′ f' f ′ , initial guess x 0 x_0 x 0 , tolerance ε \varepsilon ε , max iterations N N N
Output: Approximate root x x x
for n = 0 , 1 , 2 , … , N − 1 n = 0, 1, 2, \ldots, N-1 n = 0 , 1 , 2 , … , N − 1 :
x n + 1 ← x n − f ( x n ) / f ′ ( x n ) \qquad x_{n+1} \gets x_n - f(x_n)/f'(x_n) x n + 1 ← x n − f ( x n ) / f ′ ( x n )
\qquad if ∣ x n + 1 − x n ∣ < ε |x_{n+1} - x_n| < \varepsilon ∣ x n + 1 − x n ∣ < ε : return x n + 1 x_{n+1} x n + 1
return x N x_N x N (or indicate failure)
Convergence Analysis ¶ Suppose f ∈ C 2 ( [ a , b ] ) f \in \mathcal{C}^2([a,b]) f ∈ C 2 ([ a , b ]) , f ( c ) = 0 f(c) = 0 f ( c ) = 0 for some c ∈ ( a , b ) c \in (a,b) c ∈ ( a , b ) , and c c c is a simple root (i.e., f ′ ( c ) ≠ 0 f'(c) \neq 0 f ′ ( c ) = 0 ). Then for x 0 x_0 x 0 sufficiently close to c c c , Newton’s method converges to c c c .
Proof 1
The iteration function is g ( x ) = x − f ( x ) f ′ ( x ) g(x) = x - \frac{f(x)}{f'(x)} g ( x ) = x − f ′ ( x ) f ( x ) .
Computing the derivative:
g ′ ( x ) = 1 − ( f ′ ) 2 − f f ′ ′ ( f ′ ) 2 = f f ′ ′ ( f ′ ) 2 g'(x) = 1 - \frac{(f')^2 - ff''}{(f')^2} = \frac{ff''}{(f')^2} g ′ ( x ) = 1 − ( f ′ ) 2 ( f ′ ) 2 − f f ′′ = ( f ′ ) 2 f f ′′ At the root:
g ′ ( c ) = f ( c ) f ′ ′ ( c ) ( f ′ ( c ) ) 2 = 0 g'(c) = \frac{f(c)f''(c)}{(f'(c))^2} = 0 g ′ ( c ) = ( f ′ ( c ) ) 2 f ( c ) f ′′ ( c ) = 0 Since g ′ ( c ) = 0 g'(c) = 0 g ′ ( c ) = 0 and g g g is continuous, there exists δ > 0 \delta > 0 δ > 0 such that ∣ g ′ ( x ) ∣ < 1 |g'(x)| < 1 ∣ g ′ ( x ) ∣ < 1 for x ∈ ( c − δ , c + δ ) x \in (c-\delta, c+\delta) x ∈ ( c − δ , c + δ ) . By the fixed point convergence theorem, the iteration converges.
Under the same assumptions as Theorem 1 , Newton’s method converges with order at least p = 2 p = 2 p = 2 (quadratic convergence). Specifically:
∣ x n + 1 − c ∣ ≤ M ∣ x n − c ∣ 2 |x_{n+1} - c| \leq M |x_n - c|^2 ∣ x n + 1 − c ∣ ≤ M ∣ x n − c ∣ 2 for some constant M > 0 M > 0 M > 0 depending on f f f .
Proof 2
This follows from the fixed point order theorem: since g ′ ( c ) = 0 g'(c) = 0 g ′ ( c ) = 0 but generally g ′ ′ ( c ) ≠ 0 g''(c) \neq 0 g ′′ ( c ) = 0 , the convergence is quadratic.
More precisely, Taylor expansion of g g g around c c c gives:
x n + 1 − c = g ( x n ) − g ( c ) = g ′ ( c ) ⏟ = 0 ( x n − c ) + g ′ ′ ( ξ ) 2 ( x n − c ) 2 x_{n+1} - c = g(x_n) - g(c) = \underbrace{g'(c)}_{=0}(x_n - c) + \frac{g''(\xi)}{2}(x_n - c)^2 x n + 1 − c = g ( x n ) − g ( c ) = = 0 g ′ ( c ) ( x n − c ) + 2 g ′′ ( ξ ) ( x n − c ) 2 for some ξ \xi ξ between x n x_n x n and c c c .
Remark 1 (Newton’s Method as a Local Contraction)
The convergence proof reveals why Newton’s method is “local”: it’s a contraction mapping only near the root!
Recall from the Banach Fixed Point Theorem that contractions converge geometrically. For Newton’s iteration g ( x ) = x − f ( x ) / f ′ ( x ) g(x) = x - f(x)/f'(x) g ( x ) = x − f ( x ) / f ′ ( x ) :
This explains Newton’s behavior:
The condition ∣ g ′ ( x ) ∣ < 1 |g'(x)| < 1 ∣ g ′ ( x ) ∣ < 1 from fixed-point theory tells us exactly when Newton iterates move closer to the root. Newton’s special feature is that g ′ ( c ) = 0 g'(c) = 0 g ′ ( c ) = 0 , giving quadratic rather than just linear convergence.
Example: Computing 3 \sqrt{3} 3 ¶ Example 1 (Babylonian Method)
For f ( x ) = x 2 − 3 f(x) = x^2 - 3 f ( x ) = x 2 − 3 , Newton’s method gives:
x n + 1 = x n − x n 2 − 3 2 x n = x n 2 + 3 2 x n = 1 2 ( x n + 3 x n ) x_{n+1} = x_n - \frac{x_n^2 - 3}{2x_n} = \frac{x_n^2 + 3}{2x_n} = \frac{1}{2}\left(x_n + \frac{3}{x_n}\right) x n + 1 = x n − 2 x n x n 2 − 3 = 2 x n x n 2 + 3 = 2 1 ( x n + x n 3 ) This is the Babylonian method for square roots! The number of correct digits roughly doubles each iteration.
Starting from x 0 = 2 x_0 = 2 x 0 = 2 :
n n n x n x_n x n Error 0 2.0 0.27 1 1.75 0.018 2 1.732143 9 × 1 0 − 5 9 \times 10^{-5} 9 × 1 0 − 5 3 1.7320508 2 × 1 0 − 9 2 \times 10^{-9} 2 × 1 0 − 9
Newton’s Method in Higher Dimensions ¶ For a vector-valued function f : R n → R n \mathbf{f}: \mathbb{R}^n \to \mathbb{R}^n f : R n → R n , Newton’s method becomes:
x n + 1 = x n − [ D f ( x n ) ] − 1 f ( x n ) \mathbf{x}_{n+1} = \mathbf{x}_n - [D\mathbf{f}(\mathbf{x}_n)]^{-1}\mathbf{f}(\mathbf{x}_n) x n + 1 = x n − [ D f ( x n ) ] − 1 f ( x n ) where D f D\mathbf{f} D f is the Jacobian matrix .
In 2D, for f ( x , y ) = ( f 1 ( x , y ) f 2 ( x , y ) ) \mathbf{f}(x,y) = \begin{pmatrix} f_1(x,y) \\ f_2(x,y) \end{pmatrix} f ( x , y ) = ( f 1 ( x , y ) f 2 ( x , y ) ) :
( x n + 1 y n + 1 ) = ( x n y n ) − ( ∂ f 1 ∂ x ∂ f 1 ∂ y ∂ f 2 ∂ x ∂ f 2 ∂ y ) − 1 ( f 1 ( x n , y n ) f 2 ( x n , y n ) ) \begin{pmatrix} x_{n+1} \\ y_{n+1} \end{pmatrix} = \begin{pmatrix} x_n \\ y_n \end{pmatrix} - \begin{pmatrix} \frac{\partial f_1}{\partial x} & \frac{\partial f_1}{\partial y} \\ \frac{\partial f_2}{\partial x} & \frac{\partial f_2}{\partial y} \end{pmatrix}^{-1} \begin{pmatrix} f_1(x_n, y_n) \\ f_2(x_n, y_n) \end{pmatrix} ( x n + 1 y n + 1 ) = ( x n y n ) − ( ∂ x ∂ f 1 ∂ x ∂ f 2 ∂ y ∂ f 1 ∂ y ∂ f 2 ) − 1 ( f 1 ( x n , y n ) f 2 ( x n , y n ) ) Each Newton step requires solving a linear system—this motivates the linear algebra chapters!
Multiple Roots ¶ Remark 2 (Modified Newton for Multiple Roots)
If the multiplicity p p p is known, the modified iteration:
x n + 1 = x n − p f ( x n ) f ′ ( x n ) x_{n+1} = x_n - p\frac{f(x_n)}{f'(x_n)} x n + 1 = x n − p f ′ ( x n ) f ( x n ) restores quadratic convergence.
Advantages and Disadvantages ¶ Advantages:
Fast — Quadratic convergence; digits double each iteration
Generalizes — Extends naturally to systems in R n \mathbb{R}^n R n
Foundation — Basis for many advanced optimization methods
Disadvantages:
Requires derivative — Need f ′ ( x ) f'(x) f ′ ( x ) ; can approximate with finite differences
Local only — May diverge with bad initial guess
Singular points — Fails where f ′ ( x ) ≈ 0 f'(x) \approx 0 f ′ ( x ) ≈ 0
Variants:
Secant method: Approximates f ′ ( x n ) f'(x_n) f ′ ( x n ) using finite differences. Order ≈ 1.618 \approx 1.618 ≈ 1.618 (the golden ratio!)
Quasi-Newton: For systems, approximate the Jacobian to reduce cost