Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Note

This page serves as a quick reference summarizing the key theorems and concepts that appear throughout the course. Use it for review or to see how different topics connect.

Big Idea

These are the fundamental concepts that unify the entire course: approximation theorems justify what we’re doing, while error analysis tells us how well we’re doing it.

Foundational Theorems

Theorem 1 (Taylor’s Theorem)

If ff is (n+1)(n+1)-times differentiable, then:

f(x)=k=0nf(k)(a)k!(xa)k+f(n+1)(ξ)(n+1)!(xa)n+1f(x) = \sum_{k=0}^{n} \frac{f^{(k)}(a)}{k!}(x-a)^k + \frac{f^{(n+1)}(\xi)}{(n+1)!}(x-a)^{n+1}

for some ξ\xi between aa and xx.

Why it matters: This is THE foundation of numerical analysis.

  • Truncation error: The remainder term tells us the error when we truncate

  • Finite differences: Derived by Taylor expanding f(x+h)f(x+h)

  • Newton’s method: Comes from linearizing ff (first-order Taylor)

  • Convergence order: Higher-order methods use more Taylor terms

Theorem 2 (Weierstrass Approximation Theorem)

Let ff be continuous on [a,b][a, b]. For any ε>0\varepsilon > 0, there exists a polynomial pp such that:

fp<ε\|f - p\|_\infty < \varepsilon

Why it matters: Justifies the entire polynomial approximation enterprise. Continuous functions CAN be approximated by polynomials. The question is how efficiently.

Warning

Weierstrass guarantees existence but says nothing about:

  • Which polynomial degree is needed

  • Which nodes to use for interpolation

  • How to construct the polynomial efficiently

These are the questions numerical analysis answers.

Theorem 3 (Best Approximation Theorem)

Let VV be an inner product space and WVW \subset V a finite-dimensional subspace. For any vVv \in V, there exists a unique wWw^* \in W minimizing vw\|v - w\|.

The best approximation is characterized by orthogonality:

vw,w=0for all wW\langle v - w^*, w \rangle = 0 \quad \text{for all } w \in W

Why it matters:

  • Least squares: AT(bAx)=0A^T(b - Ax) = 0 is exactly this orthogonality condition

  • Fourier/Chebyshev: Coefficients come from projecting onto basis functions

  • Why QR works: Orthogonal projection onto Col(A)\text{Col}(A)

Forward and Backward Error

Definition 1 (Forward and Backward Error)

For a problem f:XYf: X \to Y and computed solution y^\hat{y}:

Forward error: How far is y^\hat{y} from the true answer?

forward error=y^f(x)\text{forward error} = \|\hat{y} - f(x)\|

Backward error: For what perturbed input would y^\hat{y} be the exact answer?

backward error=min{δx:f(x+δx)=y^}\text{backward error} = \min \{ \|\delta x\| : f(x + \delta x) = \hat{y} \}

Definition 2 (Condition Number)

The condition number measures sensitivity of the problem to input perturbations:

κ=limδ0supδxδf(x+δx)f(x)/f(x)δx/x\kappa = \lim_{\delta \to 0} \sup_{\|\delta x\| \leq \delta} \frac{\|f(x + \delta x) - f(x)\| / \|f(x)\|}{\|\delta x\| / \|x\|}

For an invertible matrix AA:

κ(A)=AA1\kappa(A) = \|A\| \cdot \|A^{-1}\|

For the 2-norm: κ2(A)=σmax(A)/σmin(A)\kappa_2(A) = \sigma_{\max}(A) / \sigma_{\min}(A).

Theorem 4 (The Fundamental Error Relationship)

forward errortrue solutionκbackward errorinput\frac{\|\text{forward error}\|}{\|\text{true solution}\|} \leq \kappa \cdot \frac{\|\text{backward error}\|}{\|\text{input}\|}

Or more simply:

relative forward errorκ×relative backward error\text{relative forward error} \lesssim \kappa \times \text{relative backward error}

Backward stable algorithm + well-conditioned problem = accurate answer.

Remark 1 (The Error Analysis Philosophy)

This decomposition separates concerns:

  • Algorithm designers aim for backward stability (small backward error)

  • Problem formulators aim for well-conditioned problems (small κ\kappa)

  • Users should check conditioning before trusting results

A backward stable algorithm gives the exact answer to a slightly wrong problem. If the problem is well-conditioned, “slightly wrong problem” means “nearly correct answer.”

The Contraction Mapping Principle

Theorem 5 (Banach Fixed Point Theorem)

Let (X,d)(X, d) be a complete metric space and T:XXT: X \to X a contraction mapping, i.e., there exists q[0,1)q \in [0, 1) such that

d(T(x),T(y))qd(x,y)for all x,yX.d(T(x), T(y)) \leq q \cdot d(x, y) \quad \text{for all } x, y \in X.

Then:

  1. TT has a unique fixed point xXx^* \in X

  2. For any starting point x0Xx_0 \in X, the iteration xn+1=T(xn)x_{n+1} = T(x_n) converges to xx^*

  3. The convergence is geometric: d(xn,x)qnd(x0,x)d(x_n, x^*) \leq q^n \cdot d(x_0, x^*)

  4. A priori bound: d(xn,x)qn1qd(x0,x1)d(x_n, x^*) \leq \frac{q^n}{1-q} d(x_0, x_1)

  5. A posteriori bound: d(xn,x)q1qd(xn1,xn)d(x_n, x^*) \leq \frac{q}{1-q} d(x_{n-1}, x_n)

Proof 1

Existence and convergence: The sequence {xn}\{x_n\} is Cauchy. For m>nm > n:

d(xn,xm)k=nm1d(xk,xk+1)k=nm1qkd(x0,x1)qn1qd(x0,x1)d(x_n, x_m) \leq \sum_{k=n}^{m-1} d(x_k, x_{k+1}) \leq \sum_{k=n}^{m-1} q^k d(x_0, x_1) \leq \frac{q^n}{1-q} d(x_0, x_1)

Since q<1q < 1, this tends to 0 as nn \to \infty. By completeness, x=limnxnx^* = \lim_{n \to \infty} x_n exists.

Taking limits in xn+1=T(xn)x_{n+1} = T(x_n) and using continuity of TT (contractions are Lipschitz, hence continuous):

x=limnxn+1=limnT(xn)=T(limnxn)=T(x)x^* = \lim_{n \to \infty} x_{n+1} = \lim_{n \to \infty} T(x_n) = T\left(\lim_{n \to \infty} x_n\right) = T(x^*)

Uniqueness: If T(x)=xT(x^*) = x^* and T(y)=yT(y^*) = y^*, then

d(x,y)=d(T(x),T(y))qd(x,y)d(x^*, y^*) = d(T(x^*), T(y^*)) \leq q \cdot d(x^*, y^*)

Since q<1q < 1, this forces d(x,y)=0d(x^*, y^*) = 0, so x=yx^* = y^*.

Convergence rate: d(xn,x)=d(T(xn1),T(x))qd(xn1,x)qnd(x0,x)d(x_n, x^*) = d(T(x_{n-1}), T(x^*)) \leq q \cdot d(x_{n-1}, x^*) \leq q^n d(x_0, x^*).

A posteriori bound: Let mm \to \infty in the Cauchy estimate with nn replaced by n1n-1.

Remark 2 (Where We Use the Banach Fixed Point Theorem)

This theorem is THE foundation for iterative methods throughout scientific computing:

ApplicationSpace XXContraction TT
Fixed-point iterationR\mathbb{R} or [a,b][a,b]g(x)g(x) with g<1|g'|_\infty < 1
Newton’s method (local)Neighborhood of rootNewton map near simple root
Picard iteration for ODEsC([0,T])C([0,T])Integral operator
Iterative linear solversRn\mathbb{R}^nxMx+cx \mapsto Mx + c with ρ(M)<1\rho(M) < 1
Implicit function theoremBanach spaceContraction from IFT proof

The a posteriori bound is particularly useful: it gives a computable error estimate from consecutive iterates.

The Neumann Series

Theorem 6 (Neumann Series)

Let A\mathcal{A} be a unital Banach algebra with identity 1 and norm \|\cdot\|. If aAa \in \mathcal{A} satisfies a<1\|a\| < 1, then (1a)(1 - a) is invertible and

(1a)1=k=0ak=1+a+a2+a3+(1 - a)^{-1} = \sum_{k=0}^{\infty} a^k = 1 + a + a^2 + a^3 + \cdots

More generally, if B(X)\mathcal{B}(X) denotes the bounded linear operators on a Banach space XX, and AB(X)A \in \mathcal{B}(X) has spectral radius ρ(A)<1\rho(A) < 1, then (IA)1=k=0Ak(I - A)^{-1} = \sum_{k=0}^\infty A^k.

Proof 2

Convergence: Since akak\|a^k\| \leq \|a\|^k (submultiplicativity of the norm in a Banach algebra) and a<1\|a\| < 1, the series k=0ak\sum_{k=0}^\infty a^k converges absolutely, hence converges in the complete space A\mathcal{A}.

Verification: The partial sums Sn=k=0nakS_n = \sum_{k=0}^n a^k satisfy

(1a)Sn=Sn(1a)=1an+1(1 - a) S_n = S_n (1 - a) = 1 - a^{n+1}

As nn \to \infty, an+1an+10\|a^{n+1}\| \leq \|a\|^{n+1} \to 0, so (1a)k=0ak=1(1 - a) \sum_{k=0}^\infty a^k = 1.

Spectral radius version: Gelfand’s formula gives ρ(A)=limkAk1/k\rho(A) = \lim_{k \to \infty} \|A^k\|^{1/k}. If ρ(A)<1\rho(A) < 1, then for any rr with ρ(A)<r<1\rho(A) < r < 1, we have AkCrk\|A^k\| \leq C r^k for large kk, ensuring convergence.

Remark 3 (Where We Use the Neumann Series)

The Neumann series is the operator-theoretic generalization of the geometric series 11r=k=0rk\frac{1}{1-r} = \sum_{k=0}^\infty r^k for r<1|r| < 1.

ApplicationBanach AlgebraElement aa
Iterative linear solversRn×n\mathbb{R}^{n \times n}Iteration matrix MM
Perturbation of inversesB(X)\mathcal{B}(X)A1ΔAA^{-1}\Delta A for perturbed A+ΔAA + \Delta A
Integral equationsB(L2)\mathcal{B}(L^2)Integral operator KK
ODE stabilityCn×n\mathbb{C}^{n \times n}hAhA in implicit methods
ResolventB(X)\mathcal{B}(X)λ1A\lambda^{-1}A for $

Key insight: The condition ρ(M)<1\rho(M) < 1 (not M<1\|M\| < 1) is necessary and sufficient for convergence. This is why spectral radius, not norm, controls iterative methods.

Corollary 1 (Perturbation of Inverses)

If AA is invertible and A1ΔA<1\|A^{-1}\| \cdot \|\Delta A\| < 1, then A+ΔAA + \Delta A is invertible and

(A+ΔA)1A1A12ΔA1A1ΔA\|(A + \Delta A)^{-1} - A^{-1}\| \leq \frac{\|A^{-1}\|^2 \|\Delta A\|}{1 - \|A^{-1}\| \|\Delta A\|}

Proof 3

Write A+ΔA=A(I+A1ΔA)A + \Delta A = A(I + A^{-1}\Delta A). Since A1ΔAA1ΔA<1\|A^{-1}\Delta A\| \leq \|A^{-1}\| \|\Delta A\| < 1, the Neumann series gives

(A+ΔA)1=(I+A1ΔA)1A1=k=0(A1ΔA)kA1(A + \Delta A)^{-1} = (I + A^{-1}\Delta A)^{-1} A^{-1} = \sum_{k=0}^\infty (-A^{-1}\Delta A)^k A^{-1}

The bound follows from estimating (I+A1ΔA)1I\|(I + A^{-1}\Delta A)^{-1} - I\|.