Condition Numbers and Stability for Linear Systems University of Massachusetts Amherst
The condition number κ ( A ) = ∥ A ∥ ∥ A − 1 ∥ \kappa(A) = \|A\|\|A^{-1}\| κ ( A ) = ∥ A ∥∥ A − 1 ∥ tells us how sensitive a linear system is to perturbations. It’s an intrinsic property of the problem—no algorithm can do better. When κ ( A ) ≳ 1 / ε mach \kappa(A) \gtrsim 1/\varepsilon_{\text{mach}} κ ( A ) ≳ 1/ ε mach , the matrix is numerically indistinguishable from singular .
Recap: Forward and Backward Error ¶ We introduced forward and backward error earlier. The golden rule connects them:
relative forward error ≲ κ × relative backward error \text{relative forward error} \lesssim \kappa \times \text{relative backward error} relative forward error ≲ κ × relative backward error The key questions for linear systems:
What is the condition number κ ( A ) \kappa(A) κ ( A ) ?
When is a system ill-conditioned?
Which algorithms achieve small backward error?
Sensitivity of Linear Systems ¶ How sensitive is the solution x \mathbf{x} x to perturbations in A A A and b \mathbf{b} b ?
For a function f ( x ) f(x) f ( x ) , we perturbed the input x x x . But a linear system A x = b A\mathbf{x} = \mathbf{b} A x = b has two inputs : the matrix A A A and the vector b \mathbf{b} b . Both are subject to errors:
b \mathbf{b} b comes from measurements — always has some noise
A A A comes from a model — coefficients may be uncertain, or stored with roundoff error
So we must understand how errors in both A A A and b \mathbf{b} b propagate to errors in x \mathbf{x} x .
For the linear system A x = b A\mathbf{x} = \mathbf{b} A x = b :
∥ δ x ∥ ∥ x ∥ ≲ κ ( A ) ( ∥ δ A ∥ ∥ A ∥ + ∥ δ b ∥ ∥ b ∥ ) \frac{\|\delta\mathbf{x}\|}{\|\mathbf{x}\|} \lesssim \kappa(A) \left(\frac{\|\delta A\|}{\|A\|} + \frac{\|\delta\mathbf{b}\|}{\|\mathbf{b}\|}\right) ∥ x ∥ ∥ δ x ∥ ≲ κ ( A ) ( ∥ A ∥ ∥ δ A ∥ + ∥ b ∥ ∥ δ b ∥ ) The quantity κ ( A ) = ∥ A ∥ ∥ A − 1 ∥ \kappa(A) = \|A\| \|A^{-1}\| κ ( A ) = ∥ A ∥∥ A − 1 ∥ is the amplification factor from relative input perturbation to relative output error.
Start simple: perturb only b \mathbf{b} b .
If A x = b A\mathbf{x} = \mathbf{b} A x = b , then x = A − 1 b \mathbf{x} = A^{-1}\mathbf{b} x = A − 1 b . Perturb b → b + δ b \mathbf{b} \to \mathbf{b} + \delta\mathbf{b} b → b + δ b :
x + δ x = A − 1 ( b + δ b ) ⇒ δ x = A − 1 δ b \mathbf{x} + \delta\mathbf{x} = A^{-1}(\mathbf{b} + \delta\mathbf{b}) \quad \Rightarrow \quad \delta\mathbf{x} = A^{-1}\delta\mathbf{b} x + δ x = A − 1 ( b + δ b ) ⇒ δ x = A − 1 δ b Taking norms: ∥ δ x ∥ ≤ ∥ A − 1 ∥ ∥ δ b ∥ \|\delta\mathbf{x}\| \leq \|A^{-1}\| \|\delta\mathbf{b}\| ∥ δ x ∥ ≤ ∥ A − 1 ∥∥ δ b ∥
For relative error , we want ∥ δ x ∥ / ∥ x ∥ \|\delta\mathbf{x}\|/\|\mathbf{x}\| ∥ δ x ∥/∥ x ∥ in terms of ∥ δ b ∥ / ∥ b ∥ \|\delta\mathbf{b}\|/\|\mathbf{b}\| ∥ δ b ∥/∥ b ∥ .
The trick: use ∥ b ∥ = ∥ A x ∥ ≤ ∥ A ∥ ∥ x ∥ \|\mathbf{b}\| = \|A\mathbf{x}\| \leq \|A\| \|\mathbf{x}\| ∥ b ∥ = ∥ A x ∥ ≤ ∥ A ∥∥ x ∥ , so 1 / ∥ x ∥ ≤ ∥ A ∥ / ∥ b ∥ 1/\|\mathbf{x}\| \leq \|A\|/\|\mathbf{b}\| 1/∥ x ∥ ≤ ∥ A ∥/∥ b ∥ .
∥ δ x ∥ ∥ x ∥ ≤ ∥ A − 1 ∥ ∥ δ b ∥ ⋅ ∥ A ∥ ∥ b ∥ = ∥ A ∥ ∥ A − 1 ∥ ⏟ κ ( A ) ⋅ ∥ δ b ∥ ∥ b ∥ \frac{\|\delta\mathbf{x}\|}{\|\mathbf{x}\|} \leq \|A^{-1}\| \|\delta\mathbf{b}\| \cdot \frac{\|A\|}{\|\mathbf{b}\|} = \underbrace{\|A\| \|A^{-1}\|}_{\kappa(A)} \cdot \frac{\|\delta\mathbf{b}\|}{\|\mathbf{b}\|} ∥ x ∥ ∥ δ x ∥ ≤ ∥ A − 1 ∥∥ δ b ∥ ⋅ ∥ b ∥ ∥ A ∥ = κ ( A ) ∥ A ∥∥ A − 1 ∥ ⋅ ∥ b ∥ ∥ δ b ∥ The condition number emerges naturally!
Now perturb A A A : Suppose ( A + δ A ) ( x + δ x ) = b (A + \delta A)(\mathbf{x} + \delta\mathbf{x}) = \mathbf{b} ( A + δ A ) ( x + δ x ) = b . Expanding:
A x + A δ x + δ A ⋅ x + δ A ⋅ δ x ⏟ ≈ 0 = b \cancel{A\mathbf{x}} + A\delta\mathbf{x} + \delta A \cdot \mathbf{x} + \underbrace{\delta A \cdot \delta\mathbf{x}}_{\approx 0} = \cancel{\mathbf{b}} A x + A δ x + δ A ⋅ x + ≈ 0 δ A ⋅ δ x = b Solving: δ x = − A − 1 ( δ A ⋅ x ) \delta\mathbf{x} = -A^{-1}(\delta A \cdot \mathbf{x}) δ x = − A − 1 ( δ A ⋅ x )
Taking norms and forming relative errors:
∥ δ x ∥ ∥ x ∥ ≤ ∥ A − 1 ∥ ∥ A ∥ ⋅ ∥ δ A ∥ ∥ A ∥ = κ ( A ) ⋅ ∥ δ A ∥ ∥ A ∥ \frac{\|\delta\mathbf{x}\|}{\|\mathbf{x}\|} \leq \|A^{-1}\| \|A\| \cdot \frac{\|\delta A\|}{\|A\|} = \kappa(A) \cdot \frac{\|\delta A\|}{\|A\|} ∥ x ∥ ∥ δ x ∥ ≤ ∥ A − 1 ∥∥ A ∥ ⋅ ∥ A ∥ ∥ δ A ∥ = κ ( A ) ⋅ ∥ A ∥ ∥ δ A ∥ The Condition Number ¶ The sensitivity theorem motivates the following definition:
κ ( A ) = ∥ A ∥ ⋅ ∥ A − 1 ∥ \kappa(A) = \|A\| \cdot \|A^{-1}\| κ ( A ) = ∥ A ∥ ⋅ ∥ A − 1 ∥ Properties:
κ ( A ) ≥ 1 \kappa(A) \geq 1 κ ( A ) ≥ 1 always (equality for orthogonal matrices)
κ ( I ) = 1 \kappa(I) = 1 κ ( I ) = 1
κ ( A ) = ∞ \kappa(A) = \infty κ ( A ) = ∞ if A A A is singular (since σ min = 0 \sigma_{\min} = 0 σ m i n = 0 )
For the 2-norm: κ 2 ( A ) = σ max / σ min \kappa_2(A) = \sigma_{\max}/\sigma_{\min} κ 2 ( A ) = σ m a x / σ m i n (ratio of largest to smallest singular value)
Rule of thumb: Expect to lose log 10 κ ( A ) \log_{10}\kappa(A) log 10 κ ( A ) digits of accuracy.
Condition Number Digits Lost κ ≈ 1 0 k \kappa \approx 10^k κ ≈ 1 0 k ~k k k digits κ ≳ 1 / ε mach ≈ 1 0 16 \kappa \gtrsim 1/\varepsilon_{\text{mach}} \approx 10^{16} κ ≳ 1/ ε mach ≈ 1 0 16 All digits
But what does it mean for a matrix to be ill-conditioned? The next section provides the key insight.
The Deep Insight: Numerically Singular Matrices ¶ Residuals and Backward Error ¶ For linear systems, backward error has a simple form:
The computed solution x ^ \hat{\mathbf{x}} x ^ exactly solves A x ^ = b − r A\hat{\mathbf{x}} = \mathbf{b} - \mathbf{r} A x ^ = b − r . The relative backward error is ∥ r ∥ / ∥ b ∥ \|\mathbf{r}\|/\|\mathbf{b}\| ∥ r ∥/∥ b ∥ .
Practical Guideline: Always Check the Condition Number ¶ Since the forward error satisfies
∥ x ^ − x ∥ ∥ x ∥ ≲ κ ( A ) ⋅ ε mach \frac{\|\hat{\mathbf{x}} - \mathbf{x}\|}{\|\mathbf{x}\|} \lesssim \kappa(A) \cdot \varepsilon_{\text{mach}} ∥ x ∥ ∥ x ^ − x ∥ ≲ κ ( A ) ⋅ ε mach we should always estimate κ ( A ) \kappa(A) κ ( A ) before trusting the solution . But how?
The Challenge ¶ Computing κ ( A ) = ∥ A ∥ ∥ A − 1 ∥ \kappa(A) = \|A\| \|A^{-1}\| κ ( A ) = ∥ A ∥∥ A − 1 ∥ exactly requires A − 1 A^{-1} A − 1 , which costs O ( n 3 ) O(n^3) O ( n 3 ) operations—as expensive as solving the system! We need a cheaper approach.
Hager’s Algorithm: A Clever Trick ¶ The key insight (Hager, 1984; refined by Higham) is that we can estimate ∥ A − 1 ∥ \|A^{-1}\| ∥ A − 1 ∥ using only a few solves with the already-factored matrix.
Cost: Each iteration requires two triangular solves. Typically converges in 2–5 iterations, so total cost is O ( n 2 ) O(n^2) O ( n 2 ) —much cheaper than the O ( n 3 ) O(n^3) O ( n 3 ) factorization.
See the Condition Number Estimation notebook for a Python implementation.
What LAPACK Does ¶ LAPACK’s routines (e.g., dgecon) implement this estimation automatically. When you call np.linalg.cond(A) or use SciPy’s linear solvers with condition estimation, this is what happens behind the scenes.
Stability of Algorithms: A Preview ¶ An algorithm is backward stable if it produces a solution with backward error ∼ ε mach \sim \varepsilon_{\text{mach}} ∼ ε mach . Combined with the golden rule:
forward error ≲ κ ( A ) ⋅ ε mach \text{forward error} \lesssim \kappa(A) \cdot \varepsilon_{\text{mach}} forward error ≲ κ ( A ) ⋅ ε mach This is the best we can hope for—any algorithm must contend with the condition number.
An algorithm is forward stable if it achieves forward error ≲ κ ( A ) ⋅ ε mach \lesssim \kappa(A) \cdot \varepsilon_{\text{mach}} ≲ κ ( A ) ⋅ ε mach directly.
Backward stability implies forward stability (via the golden rule), but is a stronger guarantee: it tells us the computed answer is the exact answer to a nearby problem. This is more useful because:
Coming up: We’ll see that:
Householder QR is backward stable (the gold standard)
LU with partial pivoting is backward stable in practice (with caveats)
Classical Gram-Schmidt is not stable—orthogonality loss scales with κ ( A ) \kappa(A) κ ( A )