Iterative Methods for Linear Systems University of Massachusetts Amherst
Many numerical problems reduce to finding x \mathbf{x} x satisfying x = M x + c \mathbf{x} = M\mathbf{x} + \mathbf{c} x = M x + c . This is a fixed point problem. Three perspectives—Banach contraction , Neumann series , and matrix splitting —give you the same answer and reveal deep connections across numerical analysis.
The Unifying Framework ¶ Consider solving the linear system A x = b A\mathbf{x} = \mathbf{b} A x = b . Instead of direct factorization, we can reformulate as a fixed point problem:
x = M x + c \mathbf{x} = M\mathbf{x} + \mathbf{c} x = M x + c and iterate:
x ( k + 1 ) = M x ( k ) + c \mathbf{x}^{(k+1)} = M\mathbf{x}^{(k)} + \mathbf{c} x ( k + 1 ) = M x ( k ) + c When does this converge? Three equivalent perspectives:
Perspective 1: Banach Fixed Point Theorem ¶ Remark 1 (Connection to Scalar Case)
This is the matrix version of the Banach Fixed Point Theorem from nonlinear equations!
Scalar Matrix x n + 1 = g ( x n ) x_{n+1} = g(x_n) x n + 1 = g ( x n ) x ( k + 1 ) = M x ( k ) + c \mathbf{x}^{(k+1)} = M\mathbf{x}^{(k)} + \mathbf{c} x ( k + 1 ) = M x ( k ) + c Converges if $ g’(x) Rate: $ e_n
The unifying principle: contractions converge geometrically .
Perspective 2: Neumann Series ¶ Rearrange x = M x + c \mathbf{x} = M\mathbf{x} + \mathbf{c} x = M x + c to get ( I − M ) x = c (I - M)\mathbf{x} = \mathbf{c} ( I − M ) x = c , so:
x = ( I − M ) − 1 c \mathbf{x} = (I - M)^{-1}\mathbf{c} x = ( I − M ) − 1 c When does this inverse exist? The Neumann series gives the answer:
Proof 1
Let S n = I + M + M 2 + ⋯ + M n S_n = I + M + M^2 + \cdots + M^n S n = I + M + M 2 + ⋯ + M n . Then:
( I − M ) S n = I − M n + 1 (I - M)S_n = I - M^{n+1} ( I − M ) S n = I − M n + 1 If ρ ( M ) < 1 \rho(M) < 1 ρ ( M ) < 1 , then M n + 1 → 0 M^{n+1} \to 0 M n + 1 → 0 as n → ∞ n \to \infty n → ∞ , so S n → ( I − M ) − 1 S_n \to (I-M)^{-1} S n → ( I − M ) − 1 .
This is the matrix analog of the geometric series:
1 1 − r = 1 + r + r 2 + ⋯ for ∣ r ∣ < 1 \frac{1}{1-r} = 1 + r + r^2 + \cdots \quad \text{for } |r| < 1 1 − r 1 = 1 + r + r 2 + ⋯ for ∣ r ∣ < 1 Perspective 3: Iterates Are Neumann Partial Sums ¶ Here’s the beautiful connection: the fixed point iterates compute the Neumann series term by term .
Proof 2
Expand the recurrence x ( k + 1 ) = M x ( k ) + c \mathbf{x}^{(k+1)} = M\mathbf{x}^{(k)} + \mathbf{c} x ( k + 1 ) = M x ( k ) + c :
x ( 1 ) = M x ( 0 ) + c \mathbf{x}^{(1)} = M\mathbf{x}^{(0)} + \mathbf{c} x ( 1 ) = M x ( 0 ) + c x ( 2 ) = M x ( 1 ) + c = M 2 x ( 0 ) + M c + c \mathbf{x}^{(2)} = M\mathbf{x}^{(1)} + \mathbf{c} = M^2\mathbf{x}^{(0)} + M\mathbf{c} + \mathbf{c} x ( 2 ) = M x ( 1 ) + c = M 2 x ( 0 ) + M c + c x ( 3 ) = M x ( 2 ) + c = M 3 x ( 0 ) + M 2 c + M c + c \mathbf{x}^{(3)} = M\mathbf{x}^{(2)} + \mathbf{c} = M^3\mathbf{x}^{(0)} + M^2\mathbf{c} + M\mathbf{c} + \mathbf{c} x ( 3 ) = M x ( 2 ) + c = M 3 x ( 0 ) + M 2 c + M c + c By induction: x ( k ) = M k x ( 0 ) + ∑ j = 0 k − 1 M j c \mathbf{x}^{(k)} = M^k \mathbf{x}^{(0)} + \sum_{j=0}^{k-1} M^j \mathbf{c} x ( k ) = M k x ( 0 ) + ∑ j = 0 k − 1 M j c .
As k → ∞ k \to \infty k → ∞ with ρ ( M ) < 1 \rho(M) < 1 ρ ( M ) < 1 :
Special case: If x ( 0 ) = 0 \mathbf{x}^{(0)} = \mathbf{0} x ( 0 ) = 0 , then x ( k ) = ( ∑ j = 0 k − 1 M j ) c \mathbf{x}^{(k)} = \left(\sum_{j=0}^{k-1} M^j\right)\mathbf{c} x ( k ) = ( ∑ j = 0 k − 1 M j ) c — the iterates are the Neumann partial sums applied to c \mathbf{c} c .
Matrix Splittings for A x = b A\mathbf{x} = \mathbf{b} A x = b ¶ To solve A x = b A\mathbf{x} = \mathbf{b} A x = b , split A = N − P A = N - P A = N − P where N N N is easy to invert:
N x = P x + b ⟹ x = N − 1 P x + N − 1 b = M x + c N\mathbf{x} = P\mathbf{x} + \mathbf{b} \implies \mathbf{x} = N^{-1}P\mathbf{x} + N^{-1}\mathbf{b} = M\mathbf{x} + \mathbf{c} N x = P x + b ⟹ x = N − 1 P x + N − 1 b = M x + c where M = N − 1 P M = N^{-1}P M = N − 1 P is the iteration matrix .
Convergence ⟺ ρ ( M ) < 1 \rho(M) < 1 ρ ( M ) < 1 ⟺ Neumann series for ( I − M ) − 1 (I - M)^{-1} ( I − M ) − 1 converges.
Jacobi Method ¶ Split A = D − ( L + U ) A = D - (L + U) A = D − ( L + U ) where D D D = diagonal, L L L = strict lower, U U U = strict upper.
Input: A A A , b \mathbf{b} b , initial guess x ( 0 ) \mathbf{x}^{(0)} x ( 0 ) , tolerance ε \varepsilon ε
Output: Approximate solution x \mathbf{x} x
for k = 0 , 1 , 2 , … k = 0, 1, 2, \ldots k = 0 , 1 , 2 , … :
\qquad for i = 1 , … , n i = 1, \ldots, n i = 1 , … , n :
\qquad\qquad x i ( k + 1 ) = 1 a i i ( b i − ∑ j ≠ i a i j x j ( k ) ) x_i^{(k+1)} = \frac{1}{a_{ii}}\left(b_i - \sum_{j \neq i} a_{ij}x_j^{(k)}\right) x i ( k + 1 ) = a ii 1 ( b i − ∑ j = i a ij x j ( k ) )
\qquad if ∥ x ( k + 1 ) − x ( k ) ∥ < ε \|\mathbf{x}^{(k+1)} - \mathbf{x}^{(k)}\| < \varepsilon ∥ x ( k + 1 ) − x ( k ) ∥ < ε : return x ( k + 1 ) \mathbf{x}^{(k+1)} x ( k + 1 )
Key property: All components use values from the previous iteration — parallelizable .
Gauss-Seidel Method ¶ Use updated values immediately as they become available.
Input: A A A , b \mathbf{b} b , initial guess x ( 0 ) \mathbf{x}^{(0)} x ( 0 ) , tolerance ε \varepsilon ε
Output: Approximate solution x \mathbf{x} x
for k = 0 , 1 , 2 , … k = 0, 1, 2, \ldots k = 0 , 1 , 2 , … :
\qquad for i = 1 , … , n i = 1, \ldots, n i = 1 , … , n :
\qquad\qquad x i ( k + 1 ) = 1 a i i ( b i − ∑ j < i a i j x j ( k + 1 ) − ∑ j > i a i j x j ( k ) ) x_i^{(k+1)} = \frac{1}{a_{ii}}\left(b_i - \sum_{j < i} a_{ij}x_j^{(k+1)} - \sum_{j > i} a_{ij}x_j^{(k)}\right) x i ( k + 1 ) = a ii 1 ( b i − ∑ j < i a ij x j ( k + 1 ) − ∑ j > i a ij x j ( k ) )
\qquad if ∥ x ( k + 1 ) − x ( k ) ∥ < ε \|\mathbf{x}^{(k+1)} - \mathbf{x}^{(k)}\| < \varepsilon ∥ x ( k + 1 ) − x ( k ) ∥ < ε : return x ( k + 1 ) \mathbf{x}^{(k+1)} x ( k + 1 )
Key property: Uses new values for j < i j < i j < i — typically faster convergence but not parallelizable .
Comparison ¶ Method Iteration Matrix M M M Parallelizable Typical Convergence Jacobi D − 1 ( L + U ) D^{-1}(L + U) D − 1 ( L + U ) Yes Slower Gauss-Seidel ( D − L ) − 1 U (D-L)^{-1}U ( D − L ) − 1 U No Faster SOR Modified G-S with ω \omega ω No Tunable
Convergence Conditions ¶ Spectral Radius ¶ The iteration x ( k + 1 ) = M x ( k ) + c \mathbf{x}^{(k+1)} = M\mathbf{x}^{(k)} + \mathbf{c} x ( k + 1 ) = M x ( k ) + c converges for all starting points if and only if ρ ( M ) < 1 \rho(M) < 1 ρ ( M ) < 1 .
The convergence rate is:
∥ x ( k ) − x ∗ ∥ ≈ C ⋅ ρ ( M ) k \|\mathbf{x}^{(k)} - \mathbf{x}^*\| \approx C \cdot \rho(M)^k ∥ x ( k ) − x ∗ ∥ ≈ C ⋅ ρ ( M ) k Number of iterations for accuracy ε \varepsilon ε : k ≈ log ( 1 / ε ) log ( 1 / ρ ( M ) ) k \approx \frac{\log(1/\varepsilon)}{\log(1/\rho(M))} k ≈ l o g ( 1/ ρ ( M )) l o g ( 1/ ε )
Sufficient Conditions ¶ Computing ρ ( M ) \rho(M) ρ ( M ) is expensive. These conditions are easier to check:
Proof 3
Diagonal dominance implies ∥ M ∥ ∞ < 1 \|M\|_\infty < 1 ∥ M ∥ ∞ < 1 , and ρ ( M ) ≤ ∥ M ∥ \rho(M) \leq \|M\| ρ ( M ) ≤ ∥ M ∥ for any matrix norm.
If A A A is symmetric positive definite, then Gauss-Seidel converges .
(Jacobi may or may not converge for SPD matrices.)
Connection to ODE Discretization ¶ The same framework appears in time-stepping for ODE s!
For y ′ = A y \mathbf{y}' = A\mathbf{y} y ′ = A y , implicit Euler gives:
y n + 1 = y n + h A y n + 1 \mathbf{y}_{n+1} = \mathbf{y}_n + h A\mathbf{y}_{n+1} y n + 1 = y n + h A y n + 1 Rearranging:
( I − h A ) y n + 1 = y n ⟹ y n + 1 = ( I − h A ) − 1 y n (I - hA)\mathbf{y}_{n+1} = \mathbf{y}_n \implies \mathbf{y}_{n+1} = (I - hA)^{-1}\mathbf{y}_n ( I − h A ) y n + 1 = y n ⟹ y n + 1 = ( I − h A ) − 1 y n Stability depends on ( I − h A ) − 1 (I - hA)^{-1} ( I − h A ) − 1 being bounded — the Neumann series tells you when!
Remark 2 (The Stability Connection)
Context Fixed Point Form Convergence/Stability Iterative solver x = M x + c \mathbf{x} = M\mathbf{x} + \mathbf{c} x = M x + c ρ ( M ) < 1 \rho(M) < 1 ρ ( M ) < 1 Implicit Euler y n + 1 = ( I − h A ) − 1 y n \mathbf{y}_{n+1} = (I-hA)^{-1}\mathbf{y}_n y n + 1 = ( I − h A ) − 1 y n Eigenvalues in stability region ODE equilibrium0 = A x ∗ \mathbf{0} = A\mathbf{x}^* 0 = A x ∗ Re ( λ i ) < 0 \text{Re}(\lambda_i) < 0 Re ( λ i ) < 0
The same spectral analysis underlies all three!
The Deeper Structure: Fredholm Theory ¶ For compact operators K K K (think: integral equations, discretized PDE s), the equation ( I − K ) x = c (I - K)\mathbf{x} = \mathbf{c} ( I − K ) x = c has special structure:
When the inverse fails to exist, the Fredholm alternative applies: either a unique solution exists, or the kernel is finite-dimensional and solvability depends on c \mathbf{c} c being orthogonal to the cokernel.
Beyond Classical Methods ¶ Successive Over-Relaxation (SOR) ¶ Gauss-Seidel can be accelerated with a relaxation parameter ω ∈ ( 0 , 2 ) \omega \in (0, 2) ω ∈ ( 0 , 2 ) :
x i ( k + 1 ) = ( 1 − ω ) x i ( k ) + ω ⋅ ( Gauss-Seidel update ) x_i^{(k+1)} = (1-\omega)x_i^{(k)} + \omega \cdot (\text{Gauss-Seidel update}) x i ( k + 1 ) = ( 1 − ω ) x i ( k ) + ω ⋅ ( Gauss-Seidel update ) With ω = 1 \omega = 1 ω = 1 you get standard Gauss-Seidel; ω > 1 \omega > 1 ω > 1 (over-relaxation) can dramatically accelerate convergence if ω \omega ω is chosen well; ω < 1 \omega < 1 ω < 1 (under-relaxation) increases stability. The optimal ω \omega ω depends on the spectrum of M G S M_{GS} M GS .
Modern Krylov Subspace Methods ¶ Classical methods (Jacobi, Gauss-Seidel, SOR) are rarely used in modern large-scale computing. Krylov subspace methods dominate:
Method For Key Property Conjugate Gradient (CG) SPD systemsOptimal in energy norm GMRES General systems Minimizes residual BiCGSTAB Non-symmetric Memory efficient
These methods search for solutions in the Krylov subspace K k ( A , b ) = span { b , A b , A 2 b , … , A k − 1 b } \mathcal{K}_k(A, \mathbf{b}) = \text{span}\{\mathbf{b}, A\mathbf{b}, A^2\mathbf{b}, \ldots, A^{k-1}\mathbf{b}\} K k ( A , b ) = span { b , A b , A 2 b , … , A k − 1 b } —finding good approximations using only matrix-vector products, never forming A − 1 A^{-1} A − 1 .
When to Use What ¶ Situation Method Small/medium dense A A A Direct (LU , QR ) Large sparse A A A , SPD Conjugate Gradient Large sparse A A A , general GMRES, BiCGSTAB Multiple right-hand sides Direct (reuse factorization) Only A v A\mathbf{v} A v available Krylov methods
The classical methods matter for understanding the theory—the Neumann series perspective, the role of ρ ( M ) \rho(M) ρ ( M ) , the connection to fixed points and ODE stability. Modern methods build on these ideas with more sophisticated polynomial approximations in Krylov subspaces.