Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Big Idea

Root finding problems can be reformulated as fixed point problems. The key insight: if g(x)<1|g'(x)| < 1 near a fixed point, the iteration xn+1=g(xn)x_{n+1} = g(x_n) converges. This provides a unified framework for analyzing iterative methods.

Fixed Points

Definition 1 (Fixed Point)

x0Rx_0 \in \mathbb{R} is a fixed point of g(x)g(x) if g(x0)=x0g(x_0) = x_0.

The basic idea: reformulate a root-finding problem as a fixed-point problem.

Root finding: f(x)=0Fixed point: x=g(x)\text{Root finding: } f(x) = 0 \quad \longrightarrow \quad \text{Fixed point: } x = g(x)

There are many ways to do this! Given f(x)=0f(x) = 0, you could write:

The choice of gg matters enormously for convergence, as we’ll see.

The Algorithm

The fixed-point iteration is beautifully simple.

Algorithm 1 (Fixed Point Iteration)

Input: Function gg, initial guess x0x_0, tolerance ε\varepsilon, max iterations NN

Output: Approximate fixed point xx

  1. for n=0,1,2,,N1n = 0, 1, 2, \ldots, N-1:

  2. xn+1g(xn)\qquad x_{n+1} \gets g(x_n)

  3. \qquad if xn+1xn<ε|x_{n+1} - x_n| < \varepsilon: return xn+1x_{n+1}

  4. return xNx_N (or indicate failure)

That’s it. But this simplicity raises three fundamental questions:

  1. Does a fixed point even exist? And if so, is it unique?

  2. Does the iteration converge? Starting from x0x_0, will the sequence {xn}\{x_n\} actually reach the fixed point?

  3. How fast does it converge? Can we do better than bisection’s linear rate?

The answer to all three turns on a single quantity: g(x)|g'(x)| near the fixed point. When g<1|g'| < 1, the map gg is a contraction: it pulls nearby points closer together, and the iteration spirals inward. When g1|g'| \geq 1, it pushes points apart, and the iteration diverges. The choice of gg controls gg', which is why different reformulations of the same problem can behave so differently.

Why the Choice of gg Matters

Consider finding the root of f(x)=x23=0f(x) = x^2 - 3 = 0 (i.e., finding 3\sqrt{3}).

Here are three valid reformulations:

All three have 3\sqrt{3} as a fixed point. But their behavior is dramatically different:

See Also

Fixed Point Iteration Demo: Compare G1, G2, G3 convergence behavior

Existence, Uniqueness and Convergence

There are three separate questions to answer, each requiring its own condition:

1. Existence: Does a fixed point exist at all? This only requires that gg maps [a,b][a,b] into itself: g([a,b])[a,b]g([a,b]) \subseteq [a,b]. Geometrically, this traps the graph of gg inside the box [a,b]×[a,b][a,b] \times [a,b]. Since g(a)ag(a) \geq a and g(b)bg(b) \leq b, the graph must cross the diagonal y=xy = x. That crossing is a fixed point (left panel). But there could be many crossings.

2. Uniqueness: Is there only one fixed point? This requires g(x)ρ<1|g'(x)| \leq \rho < 1 on the interval. The map is then a contraction: it shrinks distances. If two fixed points c1,c2c_1, c_2 existed, the MVT gives c1c2=g(c1)g(c2)=g(ξ)c1c2ρc1c2|c_1 - c_2| = |g(c_1) - g(c_2)| = |g'(\xi)||c_1 - c_2| \leq \rho|c_1 - c_2|, a contradiction since ρ<1\rho < 1. Geometrically, g<1|g'| < 1 means the graph is everywhere less steep than the diagonal, so it can only cross once (center panel). When g>1|g'| > 1, the graph can be steeper than the diagonal, allowing multiple crossings (right panel).

3. Convergence: Does the iteration xn+1=g(xn)x_{n+1} = g(x_n) actually reach the fixed point? This is a question about the dynamics of iterating gg, not just its graph. The same contraction condition gρ<1|g'| \leq \rho < 1 guarantees convergence: each step shrinks the distance to the fixed point, so the cobweb spirals inward (center panel). When g>1|g'| > 1, the iteration amplifies errors. The cobweb diverges away from the fixed point even if you start nearby (right panel).

Source
<Figure size 1500x500 with 3 Axes>

Geometric intuition for fixed point iteration. In the cobweb diagrams (center and right), the black star (\star) marks the starting point x0x_0 and the red square marks the final iterate. Left: If gg maps [a,b][a,b] into itself, its graph stays inside the blue box and must cross the diagonal y=xy = x at least once, guaranteeing existence of a fixed point. Center: When g<1|g'| < 1 everywhere on [a,b][a,b], the graph is less steep than the diagonal, so there is exactly one crossing. The cobweb diagram shows the iteration spiraling inward from the start (\star) to the unique fixed point. Right: When g([a,b])[a,b]g([a,b]) \subseteq [a,b] but g>1|g'| > 1 near the center, the graph is steep enough to cross the diagonal multiple times, creating three fixed points. The cobweb starting near the central (unstable) fixed point (\star) diverges away from it, eventually settling at one of the stable outer fixed points where g<1|g'| < 1.

Theorem 1 (Fixed Point Existence, Uniqueness, and Convergence)

Given g:[a,b]Rg: [a, b] \to \mathbb{R} continuous:

Existence: If gg maps [a,b][a,b] into itself (i.e., g([a,b])[a,b]g([a,b]) \subseteq [a,b]), then gg has at least one fixed point in [a,b][a,b].

Uniqueness: If additionally g(x)ρ<1|g'(x)| \leq \rho < 1 for all x[a,b]x \in [a,b], then the fixed point is unique.

Convergence: Under the same conditions, for any x0[a,b]x_0 \in [a,b], the sequence xn+1=g(xn)x_{n+1} = g(x_n) converges to the unique fixed point cc, and the convergence is geometric:

xncρnx0c|x_n - c| \leq \rho^n |x_0 - c|

Proof 1 (Existence)

Define h(x)=xg(x)h(x) = x - g(x).

If g(a)=ag(a) = a or g(b)=bg(b) = b, we’re done. We’ve found a fixed point.

Otherwise, since g([a,b])[a,b]g([a,b]) \subseteq [a,b]:

  • g(a)>ag(a) > a, so h(a)=ag(a)<0h(a) = a - g(a) < 0

  • g(b)<bg(b) < b, so h(b)=bg(b)>0h(b) = b - g(b) > 0

By the Intermediate Value Theorem, there exists c(a,b)c \in (a,b) with h(c)=0h(c) = 0, i.e., g(c)=cg(c) = c.

Proof 2 (Uniqueness)

Suppose two fixed points c1<c2c_1 < c_2 exist. By the Mean Value Theorem:

c1c2=g(c1)g(c2)=g(ξ)c1c2ρc1c2|c_1 - c_2| = |g(c_1) - g(c_2)| = |g'(\xi)||c_1 - c_2| \leq \rho|c_1 - c_2|

for some ξ(c1,c2)\xi \in (c_1, c_2).

This implies (1ρ)c1c20(1-\rho)|c_1 - c_2| \leq 0. But ρ<1\rho < 1 and c1c2c_1 \neq c_2, so (1ρ)c1c2>0(1-\rho)|c_1 - c_2| > 0. This is a contradiction.

Proof 3 (Convergence)

Since cc is a fixed point, g(c)=cg(c) = c. Using the Mean Value Theorem:

xn+1c=g(xn)g(c)=g(ξn)xncρxnc|x_{n+1} - c| = |g(x_n) - g(c)| = |g'(\xi_n)||x_n - c| \leq \rho|x_n - c|

Applying this recursively:

xncρxn1cρ2xn2cρnx0c|x_n - c| \leq \rho|x_{n-1} - c| \leq \rho^2|x_{n-2} - c| \leq \cdots \leq \rho^n|x_0 - c|

Since ρ<1\rho < 1, we have ρn0\rho^n \to 0, so xncx_n \to c.

Remark 1 (Understanding Geometric Convergence)

The bound xncρnx0c|x_n - c| \leq \rho^n |x_0 - c| is called geometric (or linear) convergence. At each step, the error is multiplied by at most ρ\rho:

en+1ρen|e_{n+1}| \leq \rho\, |e_n|

The value of ρ\rho, called the contraction factor, controls the speed:

  • ρ\rho close to 0: fast convergence (each step eliminates most of the error).

  • ρ\rho close to 1: painfully slow (each step barely improves the approximation).

  • ρ=1\rho = 1: no convergence guarantee.

Taking logarithms of the error bound gives logennlogρ+loge0\log|e_n| \leq n \log \rho + \log|e_0|, so geometric convergence appears as a straight line on a log-scale plot, with slope logρ\log \rho. Smaller ρ\rho means a steeper downward slope.

Source
<Figure size 800x450 with 1 Axes>

Bounding the Forward Error from the Step Size

The convergence proof shows that xn+1xnρxnxn1|x_{n+1} - x_n| \leq \rho |x_n - x_{n-1}|. This means the steps are shrinking geometrically. We can use this to bound how far xnx_n is from the true fixed point cc, even though we don’t know cc.

Lemma 1 (Step Size Bounds the Forward Error)

Suppose an iterative method produces a sequence {xn}\{x_n\} that is contractive:

xn+1xnρxnxn1for some ρ<1|x_{n+1} - x_n| \leq \rho |x_n - x_{n-1}| \quad \text{for some } \rho < 1

Then the sequence converges to some limit cc, and the forward error satisfies:

xncxn+1xn1ρ|x_n - c| \leq \frac{|x_{n+1} - x_n|}{1 - \rho}

Proof 4

Since the sequence converges, we can write the error as a telescoping sum:

xnc=xnlimkxk=k=0(xn+kxn+k+1)x_n - c = x_n - \lim_{k \to \infty} x_k = \sum_{k=0}^{\infty} (x_{n+k} - x_{n+k+1})

Taking absolute values and using the contraction property xn+kxn+k+1ρkxnxn+1|x_{n+k} - x_{n+k+1}| \leq \rho^k |x_n - x_{n+1}|:

xnck=0ρkxnxn+1=xn+1xn1ρ|x_n - c| \leq \sum_{k=0}^{\infty} \rho^k |x_n - x_{n+1}| = \frac{|x_{n+1} - x_n|}{1 - \rho}

This justifies the stopping test xn+1xn<ε|x_{n+1} - x_n| < \varepsilon used in our algorithms. If the step test is satisfied, the lemma guarantees:

xncε1ρ|x_n - c| \leq \frac{\varepsilon}{1 - \rho}

When ρ\rho is small (fast convergence), 1/(1ρ)11/(1-\rho) \approx 1 and the forward error is close to ε\varepsilon. When ρ\rho is close to 1 (slow convergence), the factor 1/(1ρ)1/(1-\rho) can be large and the step test becomes unreliable.

The Derivative at the Fixed Point

The key insight: g(c)|g'(c)| determines everything.

For our three reformulations of x23=0x^2 - 3 = 0:

The design principle: make g(c)|g'(c)| as small as possible. The optimal choice is g(c)=0g'(c) = 0, which is exactly what Newton’s method achieves by setting g(x)=xf(x)/f(x)g(x) = x - f(x)/f'(x).

The Banach Fixed Point Theorem

Note

Optional: a look ahead. The convergence results above are special cases of a far more general principle. This section is not required for what follows, but it gives a glimpse of why the same contraction idea appears in so many areas of mathematics.

The convergence results above are special cases of a fundamental principle that appears throughout mathematics.

Theorem 2 (Banach Fixed Point Theorem)

Let (X,d)(X, d) be a complete metric space and T:XXT: X \to X be a contraction:

d(T(x),T(y))qd(x,y)for all x,yXd(T(x), T(y)) \leq q \cdot d(x, y) \quad \text{for all } x, y \in X

for some q<1q < 1. Then:

  1. TT has a unique fixed point xx^*

  2. The iteration xn+1=T(xn)x_{n+1} = T(x_n) converges from any starting point

  3. Convergence is geometric: d(xn,x)qnd(x0,x)d(x_n, x^*) \leq q^n \cdot d(x_0, x^*)

Remark 2 (Why the Banach FPT Matters)

The power of this theorem is in the generality of the space (X,d)(X, d). A complete metric space is any space with a notion of distance where Cauchy sequences converge. This includes:

  • Rn\mathbb{R}^n with the Euclidean distance (our setting here)

  • Function spaces like C([a,b])C([a,b]) with the sup-norm d(f,g)=maxf(x)g(x)d(f,g) = \max|f(x) - g(x)|

  • Probability spaces like the space of distributions with the Wasserstein distance

  • Sequence spaces like 2\ell^2 with the 2\ell^2-norm

In each of these spaces, the Banach FPT gives existence, uniqueness, and convergence in one shot. The same principle governs:

  • Newton’s method for systems: The iteration is a contraction near the solution

  • Picard iteration for ODEs: Proves existence and uniqueness for y=f(t,y)y' = f(t,y) by showing the integral operator is a contraction on C([0,T])C([0,T])

  • Iterative linear solvers: Jacobi and Gauss-Seidel converge when the iteration matrix is a contraction on Rn\mathbb{R}^n

Whenever you see an iterative method that “works,” there is often a contraction hiding underneath.