Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Big Idea

A computed root is never exact. The question is: how wrong can it be? Condition numbers tell us how sensitive the root is to perturbations in the problem, and forward/backward error gives us a framework for interpreting what our algorithm actually computed.

The Absolute Condition Number

Before analyzing specific root-finding methods, we need to understand a more fundamental question: how sensitive is the root itself to perturbations in the problem? We introduced condition numbers and stability in the error and stability chapter. Here we apply these ideas specifically to root finding. This is independent of the algorithm used. Even with exact arithmetic, if the problem data is slightly wrong (due to measurement error, rounding, or modelling), the root will shift. The condition number quantifies this sensitivity.

For root finding, we work with the absolute condition number rather than the relative condition number. The reason is fundamental: the input to a root-finding problem is the function value f(x)=0f(x^*) = 0, and relative error is undefined when the quantity is zero. Since we are perturbing around zero, only absolute perturbations make sense.

Definition 1 (Absolute Condition Number)

Consider a problem that maps input data xx to output y=F(x)y = F(x). The absolute condition number is the worst-case ratio of absolute output change to absolute input change for small perturbations:

κ^=limδ0sup0<ΔxδF(x+Δx)F(x)Δx\hat{\kappa} = \lim_{\delta \to 0} \sup_{0 < |\Delta x| \leq \delta} \frac{|F(x + \Delta x) - F(x)|}{|\Delta x|}

When FF is differentiable, this is exactly κ^=F(x)\hat{\kappa} = |F'(x)|.

A problem is well-conditioned if κ^\hat{\kappa} is small and ill-conditioned if κ^\hat{\kappa} is large.

Unlike the relative condition number, the limit here cannot be avoided. In the relative case, the perturbation Δx|\Delta x| cancels between numerator and denominator, keeping the ratio bounded. For the absolute condition number, no such cancellation occurs: if FF is nonlinear, the ratio F(x+Δx)F(x)/Δx|F(x + \Delta x) - F(x)|/|\Delta x| can grow without bound as Δx|\Delta x| increases. The limit restricts attention to infinitesimal perturbations, where the ratio reduces to F(x)|F'(x)|.

The Condition Number for Roots

Now we apply this to root finding. Suppose f(x)=0f(x^*) = 0 and we perturb ff slightly, say to f~=f+ϵg\tilde{f} = f + \epsilon g for some bounded function gg. The perturbed equation f~(x~)=0\tilde{f}(\tilde{x}^*) = 0 has a shifted root x~\tilde{x}^*. How large is the shift x~x|\tilde{x}^* - x^*| relative to the perturbation size ϵ\epsilon?

Theorem 1 (Condition Number for a Simple Root)

If f(x)=0f(x^*) = 0 and f(x)0f'(x^*) \neq 0 (a simple root), then the absolute condition number of the root with respect to perturbations of ff is:

κ^=1f(x)\hat{\kappa} = \frac{1}{|f'(x^*)|}

Proof 1

Perturb ff to f~=f+ϵg\tilde{f} = f + \epsilon g. The perturbed root x~\tilde{x}^* satisfies f~(x~)=0\tilde{f}(\tilde{x}^*) = 0, so:

f(x~)+ϵg(x~)=0f(\tilde{x}^*) + \epsilon g(\tilde{x}^*) = 0

Expand f(x~)f(\tilde{x}^*) in a Taylor series around xx^*:

f(x)+f(x)(x~x)+O((x~x)2)+ϵg(x~)=0f(x^*) + f'(x^*)(\tilde{x}^* - x^*) + \mathcal{O}((\tilde{x}^* - x^*)^2) + \epsilon g(\tilde{x}^*) = 0

Since f(x)=0f(x^*) = 0:

f(x)(x~x)ϵg(x~)f'(x^*)(\tilde{x}^* - x^*) \approx -\epsilon g(\tilde{x}^*)

Therefore:

x~xϵg(x)f(x)\tilde{x}^* - x^* \approx -\frac{\epsilon g(x^*)}{f'(x^*)}

Taking absolute values:

x~xg(x)f(x)ϵ|\tilde{x}^* - x^*| \approx \frac{|g(x^*)|}{|f'(x^*)|} \cdot \epsilon

The perturbation to ff has amplitude ϵg(x)\epsilon |g(x^*)|, so the condition number is the ratio of the output change x~x|\tilde{x}^* - x^*| to the input change ϵg(x)\epsilon |g(x^*)|:

κ^=x~xϵg(x)1f(x)\hat{\kappa} = \frac{|\tilde{x}^* - x^*|}{\epsilon |g(x^*)|} \approx \frac{1}{|f'(x^*)|}

Note that gg cancels entirely: the condition number depends only on f(x)f'(x^*), not on the shape of the perturbation.

Remark 2 (Connection to the Lipschitz Constant)

The quantity f(x)|f'(x^*)| is the local Lipschitz constant of ff near the root: it controls how much ff changes per unit change in xx. The condition number κ^=1/f(x)\hat{\kappa} = 1/|f'(x^*)| is its reciprocal, which is the Lipschitz constant of the inverse map from perturbations in ff to shifts in the root. A large Lipschitz constant for ff (steep crossing) means a small condition number. A small Lipschitz constant for ff (shallow crossing) means a large condition number.

Remark 3 (Geometric Interpretation)

The condition number has a simple geometric meaning:

  • Well-conditioned (κ\kappa small): ff crosses zero steeply. A small vertical perturbation of the curve barely moves the root horizontally.

  • Ill-conditioned (κ\kappa large): ff crosses zero at a shallow angle. The same small vertical perturbation slides the root far along the xx-axis.

This is why nearly-tangent crossings are dangerous: the root sits on a nearly flat part of ff, so it’s easy to perturb.

Source
<Figure size 1200x450 with 2 Axes>

Conditioning of a root. Both panels show the same vertical perturbation ϵ\epsilon (gray) applied to ff (blue band between dashed curves). Left: When ff crosses zero steeply (f(x)=2|f'(x^*)| = 2), the perturbed roots (red squares) barely move from the true root (black dot). Right: When ff crosses zero at a shallow angle (f(x)=0.3|f'(x^*)| = 0.3), the same perturbation slides the root much further along the xx-axis. The horizontal shift Δx\Delta x^* is amplified by the condition number κ^=1/f(x)\hat{\kappa} = 1/|f'(x^*)|.

Examples

Example 1 (Well-Conditioned Root)

Consider f(x)=x1f(x) = x - 1 with root x=1x^* = 1.

Since f(x)=1f'(x) = 1, we have f(x)=1f'(x^*) = 1, so:

κ=1f(1)=1\kappa = \frac{1}{|f'(1)|} = 1

This is perfectly conditioned. A perturbation of size ϵ\epsilon in ff causes a perturbation of size ϵ\epsilon in the root.

Example 2 (Ill-Conditioned Root)

Consider f(x)=(x1)21010f(x) = (x-1)^2 - 10^{-10} with roots near x=1x = 1.

The roots are at x=1±105x^* = 1 \pm 10^{-5}.

At these roots, f(x)=2(x1)=±2×105f'(x^*) = 2(x^* - 1) = \pm 2 \times 10^{-5}, so:

κ=12×105=5×104\kappa = \frac{1}{2 \times 10^{-5}} = 5 \times 10^{4}

A perturbation of size ϵ=1016\epsilon = 10^{-16} (machine epsilon) shifts the root by at most:

Δxκ^ϵ=5×104×1016=5×1012|\Delta x^*| \leq \hat{\kappa} \cdot \epsilon = 5 \times 10^4 \times 10^{-16} = 5 \times 10^{-12}

We can expect to lose about 5 digits of accuracy. This is the key inequality: the root shift (forward error) is bounded by the condition number times the perturbation size (backward error).

Forward and Backward Error

The condition number tells us how sensitive the problem is, but in practice we face another problem: our algorithm returns an approximation x^\hat{x} and we need to judge how good it is. The true root xx^* is unknown, so we cannot directly compute x^x|\hat{x} - x^*|. What can we compute? We can evaluate f(x^)f(\hat{x}), which tells us how well x^\hat{x} satisfies the equation. But does a small f(x^)f(\hat{x}) guarantee that x^\hat{x} is close to xx^*?

The answer, as the condition number analysis suggests, is: it depends on the conditioning. This leads to two different ways of measuring error. One measures how far x^\hat{x} is from the true answer. The other measures how well x^\hat{x} satisfies the equation it is supposed to solve.

Definition 2 (Forward and Backward Error for Root Finding)

Let xx^* be the true root of f(x)=0f(x) = 0 and let x^\hat{x} be an approximation.

  • The forward error is x^x|\hat{x} - x^*|: how far x^\hat{x} is from the true root.

  • The backward error (also called the residual) is f(x^)|f(\hat{x})|: how well x^\hat{x} satisfies the equation.

The forward error is what we care about but cannot compute (we don’t know xx^*). The backward error is easy to compute, just evaluate f(x^)f(\hat{x}). As we will see, they are connected through the condition number.

Theorem 2 (Relating Forward and Backward Error)

Suppose fC1f \in \mathcal{C}^1, f(x)=0f(x^*) = 0 is a simple root, and x^\hat{x} is sufficiently close to xx^*. Then:

x^xforward errorκ^f(x^)backward error\underbrace{|\hat{x} - x^*|}_{\text{forward error}} \leq \hat{\kappa} \cdot \underbrace{|f(\hat{x})|}_{\text{backward error}}

where κ^=1/f(x)\hat{\kappa} = 1/|f'(x^*)| is the condition number of the root.

Proof 2

By the mean value theorem:

f(x^)=f(x^)f(x)=f(ξ)(x^x)f(\hat{x}) = f(\hat{x}) - f(x^*) = f'(\xi)(\hat{x} - x^*)

for some ξ\xi in the interval I=[x,x^]I = [x^*, \hat{x}] (or [x^,x][\hat{x}, x^*]). Define m=minxIf(x)m = \min_{x \in I} |f'(x)| and M=maxxIf(x)M = \max_{x \in I} |f'(x)|. Since ξI\xi \in I, we have mf(ξ)Mm \leq |f'(\xi)| \leq M, so:

mx^xf(x^)Mx^xm \, |\hat{x} - x^*| \leq |f(\hat{x})| \leq M \, |\hat{x} - x^*|

Rearranging gives bounds on the forward error in both directions:

1Mf(x^)x^x1mf(x^)\frac{1}{M} \cdot |f(\hat{x})| \leq |\hat{x} - x^*| \leq \frac{1}{m} \cdot |f(\hat{x})|

As II shrinks around xx^*, both mm and MM converge to f(x)|f'(x^*)|, so the upper bound becomes x^xf(x^)/f(x)=κ^f(x^)|\hat{x} - x^*| \leq |f(\hat{x})| / |f'(x^*)| = \hat{\kappa} \cdot |f(\hat{x})|.

This is a fundamental principle that appears throughout numerical analysis:

forward errorcondition number×backward error\text{forward error} \leq \text{condition number} \times \text{backward error}

A small residual guarantees a small forward error only when the problem is well-conditioned. We see it here for root finding with κ^=1/f(x)\hat{\kappa} = 1/|f'(x^*)|. We will see it again for linear systems in the form Δx/xκ(A)Δb/b\|\Delta\mathbf{x}\|/\|\mathbf{x}\| \leq \kappa(\mathbf{A}) \cdot \|\Delta\mathbf{b}\|/\|\mathbf{b}\|. The same structure governs eigenvalue problems, least squares, differential equations, and essentially every problem in scientific computing.

Source
<Figure size 1200x450 with 2 Axes>

Forward vs backward error. The green arrow is the backward error (residual f(x^)|f(\hat{x})|), the red arrow is the forward error (x^x|\hat{x} - x^*|). Both panels have the same residual of 0.5. Left: The steep crossing has a small condition number (κ^=0.5\hat{\kappa} = 0.5), so the forward error is 0.25, smaller than the residual. Right: The shallower crossing has a larger condition number (κ^=5\hat{\kappa} = 5), and the same residual now gives a forward error of 2.5, ten times larger. A larger condition number means more amplification from backward to forward error.

Stopping Criteria

This relationship has immediate consequences for when to stop iterating. Recall from the step size lemma that for a contractive iteration with contraction factor ρ<1\rho < 1:

xnxxn+1xn1ρ|x_n - x^*| \leq \frac{|x_{n+1} - x_n|}{1 - \rho}

So the step size xn+1xn|x_{n+1} - x_n| is a proxy for the forward error, and the condition number connects the residual (backward error) to the forward error. This gives us two independent ways to judge convergence.

Remark 4 (Two Stopping Criteria)

In practice, we use two stopping criteria:

  1. Residual test: Stop when f(xn)<ε1|f(x_n)| < \varepsilon_1 (small backward error)

  2. Step test: Stop when xn+1xn<ε2|x_{n+1} - x_n| < \varepsilon_2 (approximate forward error)

For well-conditioned problems (κ^1\hat{\kappa} \approx 1), these are equivalent: a small residual guarantees a small forward error.

For ill-conditioned problems (κ^1\hat{\kappa} \gg 1), they can disagree dramatically. A small residual f(x^)=1015|f(\hat{x})| = 10^{-15} with κ^=105\hat{\kappa} = 10^5 means the forward error could be as large as 10-10.

Rule of thumb: always check both. If the residual is small but the iterates are still moving, the problem is ill-conditioned.

Repeated Roots

We now turn to a class of problems where ill-conditioning is intrinsic.

A root xx^* is called repeated (or a root of higher multiplicity) when ff not only vanishes at xx^* but also touches zero “flatly,” meaning its first several derivatives vanish there as well.

Definition 3 (Root Multiplicity)

A root xx^* of ff has multiplicity mm if:

f(x)=(xx)mh(x),h(x)0f(x) = (x - x^*)^m h(x), \quad h(x^*) \neq 0

Equivalently, f(x)=f(x)==f(m1)(x)=0f(x^*) = f'(x^*) = \cdots = f^{(m-1)}(x^*) = 0 but f(m)(x)0f^{(m)}(x^*) \neq 0.

A root with m=1m = 1 is called simple. A root with m2m \geq 2 is called repeated.

Source
<Figure size 800x400 with 1 Axes>

Simple vs repeated roots. A simple root (m=1m=1, blue) crosses zero at a nonzero angle. A double root (m=2m=2, orange) merely touches zero and turns around. A quadruple root (m=4m=4, green) is flatter still. The higher the multiplicity, the flatter ff is near the root, which is precisely what makes repeated roots ill-conditioned.

The distinction matters for two reasons: conditioning and convergence speed.

Conditioning of repeated roots

For a simple root, the condition number is κ^=1/f(x)\hat{\kappa} = 1/|f'(x^*)|, which is finite. For a repeated root, f(x)=0f'(x^*) = 0, so the formula gives κ^=\hat{\kappa} = \infty. But what does “infinite condition number” actually mean in practice? It does not mean the root is impossible to compute. It means that the relationship between perturbation size and root shift is no longer linear.

Theorem 3 (Sensitivity of Repeated Roots)

If xx^* is a root of multiplicity mm, then a perturbation ff+ϵf \to f + \epsilon shifts the root by O(ϵ1/m)\mathcal{O}(\epsilon^{1/m}).

Proof 3

Near a root of multiplicity mm, we have f(x)c(xx)mf(x) \approx c(x - x^*)^m for some constant c=f(m)(x)/m!0c = f^{(m)}(x^*)/m! \neq 0. The perturbed equation f(x~)+ϵ=0f(\tilde{x}^*) + \epsilon = 0 gives:

c(x~x)mϵ    x~x(ϵc)1/mc(\tilde{x}^* - x^*)^m \approx -\epsilon \implies \tilde{x}^* - x^* \approx \left(\frac{\epsilon}{|c|}\right)^{1/m}

The exponent 1/m1/m is the key. When ϵ\epsilon is small (as in floating-point rounding), ϵ1/mϵ\epsilon^{1/m} \gg \epsilon. For example, with ϵ=1016\epsilon = 10^{-16} (machine epsilon), a simple root shifts by 10-16, a double root shifts by 10-8, and a triple root shifts by 1016/3105.310^{-16/3} \approx 10^{-5.3}. The following figure shows why this happens geometrically.

Source
<Figure size 1500x450 with 3 Axes>

Root shift under perturbation for f(x)=(xx)mf(x) = (x - x^*)^m. The solid blue curve is the original function, the dashed blue curve is the perturbed function. The black dot is the root of ff, the red square is the root of the perturbed function, and the red line is the root shift (forward error). Left: A simple root crosses zero steeply, so the perturbation barely moves the root. Center: A double root is flat near xx^*, so the perturbed curve’s zero crossing slides much further: the shift is ϵ\sqrt{\epsilon}. Right: A triple root is flatter still, and the shift grows to ϵ1/3\epsilon^{1/3}. Note that for even-multiplicity roots (m=2m = 2), we must shift down to create a new root. A perturbation in the other direction would lift the curve away from zero entirely, destroying the root. This is another way even-multiplicity roots are fragile: they can simply disappear under perturbation.

Convergence of Newton’s method at repeated roots

Beyond the conditioning issue, Newton’s method also converges more slowly at repeated roots.

Proposition 1 (Reduced Convergence at Repeated Roots)

When xx^* is a root of multiplicity m2m \geq 2, Newton’s method converges only linearly with rate ρ=11/m\rho = 1 - 1/m.

Proof 4

Write f(x)=(xx)mh(x)f(x) = (x - x^*)^m h(x) with h(x)0h(x^*) \neq 0. Newton’s iteration function is g(x)=xf(x)/f(x)g(x) = x - f(x)/f'(x). A calculation shows:

g(x)=11mg'(x^*) = 1 - \frac{1}{m}

Since g(x)0g'(x^*) \neq 0, convergence is linear with rate ρ=g(x)=11/m\rho = |g'(x^*)| = 1 - 1/m.

For a double root (m=2m = 2), ρ=1/2\rho = 1/2. For a triple root (m=3m = 3), ρ=2/3\rho = 2/3. As the multiplicity increases, convergence slows.

So repeated roots cause a double penalty: fewer accurate digits and slower convergence. Both problems can be mitigated if the multiplicity is known.

Remark 5 (Modified Newton for Repeated Roots)

If the multiplicity mm is known, the modified iteration:

xn+1=xnmf(xn)f(xn)x_{n+1} = x_n - m\frac{f(x_n)}{f'(x_n)}

restores quadratic convergence. This works because the modified iteration function g(x)=xmf(x)/f(x)g(x) = x - mf(x)/f'(x) satisfies g(x)=0g'(x^*) = 0.

In practice, mm is rarely known in advance. An alternative is to apply Newton’s method to u(x)=f(x)/f(x)u(x) = f(x)/f'(x), which has a simple root at xx^* regardless of the multiplicity of ff. This avoids the need to know mm but requires computing both ff and ff' at each step.