Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

The Problem

Given linearly independent vectors a1,,ana_1, \ldots, a_n, find an orthonormal basis q1,,qnq_1, \ldots, q_n for the same subspace.

The Algorithm

The idea is simple: process vectors one at a time, subtracting off their projections onto previously computed orthonormal vectors.

Example 1 (Step-by-Step Gram-Schmidt)

Given a1=(110)a_1 = \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix}, a2=(101)a_2 = \begin{pmatrix} 1 \\ 0 \\ 1 \end{pmatrix}:

Step 1: Normalize a1a_1:

q1=a1a1=12(110)q_1 = \frac{a_1}{\|a_1\|} = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix}

Step 2: Compute projection coefficient:

a2,q1=(101)12(110)=12\langle a_2, q_1 \rangle = \begin{pmatrix} 1 \\ 0 \\ 1 \end{pmatrix} \cdot \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix} = \frac{1}{\sqrt{2}}

Step 3: Subtract projection:

v2=a2a2,q1q1=(101)1212(110)=(1/21/21)v_2 = a_2 - \langle a_2, q_1 \rangle \, q_1 = \begin{pmatrix} 1 \\ 0 \\ 1 \end{pmatrix} - \frac{1}{\sqrt{2}} \cdot \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix} = \begin{pmatrix} 1/2 \\ -1/2 \\ 1 \end{pmatrix}

Step 4: Normalize:

v2=1/4+1/4+1=3/2\|v_2\| = \sqrt{1/4 + 1/4 + 1} = \sqrt{3/2}
q2=v2v2=23(1/21/21)=(1/61/62/6)q_2 = \frac{v_2}{\|v_2\|} = \sqrt{\frac{2}{3}} \begin{pmatrix} 1/2 \\ -1/2 \\ 1 \end{pmatrix} = \begin{pmatrix} 1/\sqrt{6} \\ -1/\sqrt{6} \\ 2/\sqrt{6} \end{pmatrix}

Verify orthogonality: q1,q2=12161216+0=0\langle q_1, q_2 \rangle = \frac{1}{\sqrt{2}} \cdot \frac{1}{\sqrt{6}} - \frac{1}{\sqrt{2}} \cdot \frac{1}{\sqrt{6}} + 0 = 0

From Gram-Schmidt to QR Factorization

Gram-Schmidt is actually computing a QR factorization! Rearranging the algorithm:

aj=i=1j1aj,qiqi+vjqja_j = \sum_{i=1}^{j-1} \langle a_j, q_i \rangle \, q_i + \|v_j\| \, q_j

This means aja_j is a linear combination of q1,,qjq_1, \ldots, q_j.

In matrix form:

(a1a2an)A=(q1q2qn)Q(r11r12r1n0r22r2n00rnn)R\underbrace{\begin{pmatrix} | & | & & | \\ a_1 & a_2 & \cdots & a_n \\ | & | & & | \end{pmatrix}}_A = \underbrace{\begin{pmatrix} | & | & & | \\ q_1 & q_2 & \cdots & q_n \\ | & | & & | \end{pmatrix}}_Q \underbrace{\begin{pmatrix} r_{11} & r_{12} & \cdots & r_{1n} \\ 0 & r_{22} & \cdots & r_{2n} \\ \vdots & & \ddots & \vdots \\ 0 & 0 & \cdots & r_{nn} \end{pmatrix}}_R

where:

Remark 1 (The Parallel: Gaussian Elimination → LU, Gram-Schmidt → QR)
AlgorithmWhat it doesMatrix factorization
Gaussian eliminationRow operations to get upper triangularA=LUA = LU
Gram-SchmidtOrthogonalization to get orthonormal basisA=QRA = QR

In both cases:

  • The algorithm performs a sequence of operations on columns/rows

  • Recording these operations as matrices gives the factorization

  • LL stores the elimination multipliers; RR stores the projection coefficients

  • UU is the reduced form; QQ is the orthonormal basis

Making This Concrete:

Gaussian elimination: Each step subtracts a multiple of one row from another. The multipliers ij\ell_{ij} go into LL; the result is UU.

Gram-Schmidt: Each step subtracts projections onto previous vectors. The projection coefficients rij=aj,qir_{ij} = \langle a_j, q_i \rangle go into RR; the orthonormal vectors form QQ.

The triangular structure of RR reflects the sequential nature: aja_j only projects onto q1,,qj1q_1, \ldots, q_{j-1}, so rij=0r_{ij} = 0 for i>ji > j.

Numerical Problems with Classical Gram-Schmidt

Despite its elegance, classical Gram-Schmidt has serious numerical issues.

Catastrophic Cancellation

Example 2 (Nearly Parallel Columns)

Consider (from Trefethen & Bau):

A=(0.700000.707110.700010.70711)A = \begin{pmatrix} 0.70000 & 0.70711 \\ 0.70001 & 0.70711 \end{pmatrix}

The columns are nearly parallel.

Why This Fails:

Computing q2q_2:

q2=a2a2,q1q1a2a2,q1q1q_2 = \frac{a_2 - \langle a_2, q_1 \rangle \, q_1}{\|a_2 - \langle a_2, q_1 \rangle \, q_1\|}

The numerator involves subtracting two nearly equal vectors:

a2a2,q1q1(0.707110.70711)1.00000(0.707100.70711)a_2 - \langle a_2, q_1 \rangle \, q_1 \approx \begin{pmatrix} 0.70711 \\ 0.70711 \end{pmatrix} - 1.00000 \begin{pmatrix} 0.70710 \\ 0.70711 \end{pmatrix}

Cancellation! We’re subtracting numbers that agree in many digits, losing precision. The result is dominated by rounding errors.

Loss of Orthogonality

In exact arithmetic, Gram-Schmidt produces perfectly orthogonal vectors. In floating-point arithmetic, the computed q^i\hat{q}_i may not be orthogonal:

q^i,q^j0for ij\langle \hat{q}_i, \hat{q}_j \rangle \neq 0 \quad \text{for } i \neq j

The loss of orthogonality scales with the condition number: for ill-conditioned matrices, the computed “orthonormal” vectors can be far from orthogonal. This defeats the entire purpose of orthogonalization!

Modified Gram-Schmidt

A simple reordering of operations improves stability:

Why Is Modified Gram-Schmidt Better?

The key difference is when we compute the projection coefficients:

ClassicalModified
rij=aj,qir_{ij} = \langle a_j, q_i \ranglerij=vj,qir_{ij} = \langle v_j, q_i \rangle
Uses original aja_jUses current vjv_j (already partially orthogonalized)
Remark 2 (The Improvement in Modified GS)

In classical Gram-Schmidt, we compute all projections using the original vector aja_j, then subtract them all at once. Rounding errors in the projections accumulate.

In modified Gram-Schmidt, we subtract each projection immediately, then compute the next projection using the updated vector. Each projection is computed against a vector that is already more orthogonal to the previous qiq_i’s.

Analogy: Think of it like this: if you’re trying to make a vector orthogonal to q1q_1 and q2q_2, classical GS computes both corrections using the original vector and subtracts them together. Modified GS first makes the vector orthogonal to q1q_1, then computes how to make that result orthogonal to q2q_2.

The second approach is more accurate because the second projection starts from a better approximation.

However: Modified Gram-Schmidt still has orthogonality loss proportional to κ(A)εmach\kappa(A) \cdot \varepsilon_{\text{mach}}. For ill-conditioned matrices, this can still be unacceptable. We need a different approach.