Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Big Idea

The Gram-Schmidt algorithm transforms any basis into an orthonormal basis—and in doing so, computes the QR factorization. Just as Gaussian elimination organized as matrix operations gives LU, Gram-Schmidt organized as matrix operations gives QR.

Prerequisite: Math 235

You learned the Gram-Schmidt process in linear algebra (Math 235). Here we revisit it with two new perspectives: (1) understanding it as computing a matrix factorization, and (2) analyzing its numerical behavior.

The Problem

Given linearly independent vectors a1,,ana_1, \ldots, a_n, find an orthonormal basis q1,,qnq_1, \ldots, q_n for the same subspace.

The Algorithm

The idea is simple: process vectors one at a time, subtracting off their projections onto previously computed orthonormal vectors.

Algorithm 1 (Classical Gram-Schmidt)

Input: Linearly independent vectors a1,,ana_1, \ldots, a_n

Output: Orthonormal vectors q1,,qnq_1, \ldots, q_n

  1. for j=1,2,,nj = 1, 2, \ldots, n:

  2. vj=aj\qquad v_j = a_j

  3. \qquad for i=1,2,,j1i = 1, 2, \ldots, j-1:

  4. vj=vjaj,qiqi\qquad\qquad v_j = v_j - \langle a_j, q_i \rangle \, q_i \quad (subtract projection)

  5. qj=vj/vj\qquad q_j = v_j / \|v_j\| \quad (normalize)

Example 1 (Step-by-Step Gram-Schmidt)

Given a1=(110)a_1 = \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix}, a2=(101)a_2 = \begin{pmatrix} 1 \\ 0 \\ 1 \end{pmatrix}:

Step 1: Normalize a1a_1:

q1=a1a1=12(110)q_1 = \frac{a_1}{\|a_1\|} = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix}

Step 2: Compute projection coefficient:

a2,q1=(101)12(110)=12\langle a_2, q_1 \rangle = \begin{pmatrix} 1 \\ 0 \\ 1 \end{pmatrix} \cdot \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix} = \frac{1}{\sqrt{2}}

Step 3: Subtract projection:

v2=a2a2,q1q1=(101)1212(110)=(1/21/21)v_2 = a_2 - \langle a_2, q_1 \rangle \, q_1 = \begin{pmatrix} 1 \\ 0 \\ 1 \end{pmatrix} - \frac{1}{\sqrt{2}} \cdot \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix} = \begin{pmatrix} 1/2 \\ -1/2 \\ 1 \end{pmatrix}

Step 4: Normalize:

v2=1/4+1/4+1=3/2\|v_2\| = \sqrt{1/4 + 1/4 + 1} = \sqrt{3/2}
q2=v2v2=23(1/21/21)=(1/61/62/6)q_2 = \frac{v_2}{\|v_2\|} = \sqrt{\frac{2}{3}} \begin{pmatrix} 1/2 \\ -1/2 \\ 1 \end{pmatrix} = \begin{pmatrix} 1/\sqrt{6} \\ -1/\sqrt{6} \\ 2/\sqrt{6} \end{pmatrix}

Verify orthogonality: q1,q2=12161216+0=0\langle q_1, q_2 \rangle = \frac{1}{\sqrt{2}} \cdot \frac{1}{\sqrt{6}} - \frac{1}{\sqrt{2}} \cdot \frac{1}{\sqrt{6}} + 0 = 0

From Gram-Schmidt to QR Factorization

Gram-Schmidt is actually computing a QR factorization! Rearranging the algorithm:

aj=i=1j1aj,qiqi+vjqja_j = \sum_{i=1}^{j-1} \langle a_j, q_i \rangle \, q_i + \|v_j\| \, q_j

This means aja_j is a linear combination of q1,,qjq_1, \ldots, q_j.

In matrix form:

(a1a2an)A=(q1q2qn)Q(r11r12r1n0r22r2n00rnn)R\underbrace{\begin{pmatrix} | & | & & | \\ a_1 & a_2 & \cdots & a_n \\ | & | & & | \end{pmatrix}}_A = \underbrace{\begin{pmatrix} | & | & & | \\ q_1 & q_2 & \cdots & q_n \\ | & | & & | \end{pmatrix}}_Q \underbrace{\begin{pmatrix} r_{11} & r_{12} & \cdots & r_{1n} \\ 0 & r_{22} & \cdots & r_{2n} \\ \vdots & & \ddots & \vdots \\ 0 & 0 & \cdots & r_{nn} \end{pmatrix}}_R

where:

Numerical Problems with Classical Gram-Schmidt

Despite its elegance, classical Gram-Schmidt has serious numerical issues.

When the columns of AA are nearly linearly dependent, the subtraction vj=ajaj,qiqiv_j = a_j - \sum \langle a_j, q_i \rangle q_i involves subtracting nearly equal vectors. This catastrophic cancellation means the computed vjv_j is dominated by rounding errors, and the resulting q^j\hat{q}_j is not orthogonal to the previous vectors.

The loss of orthogonality scales with the condition number of AA: the computed vectors satisfy q^i,q^j=O(κ(A)εmach)|\langle \hat{q}_i, \hat{q}_j \rangle| = \mathcal{O}(\kappa(A) \cdot \varepsilon_{\text{mach}}). For ill-conditioned matrices, this can be unacceptably large.

Example 2 (Loss of Orthogonality)

Consider (from Trefethen and Bau):

A=(0.700000.707110.700010.70711)A = \begin{pmatrix} 0.70000 & 0.70711 \\ 0.70001 & 0.70711 \end{pmatrix}

The columns are nearly parallel. Computing q2q_2 requires:

v2=a2a2,q1q1(0.707110.70711)1.00000(0.707100.70711)v_2 = a_2 - \langle a_2, q_1 \rangle \, q_1 \approx \begin{pmatrix} 0.70711 \\ 0.70711 \end{pmatrix} - 1.00000 \begin{pmatrix} 0.70710 \\ 0.70711 \end{pmatrix}

The subtraction cancels most significant digits. The result is dominated by rounding errors, so q^2\hat{q}_2 is far from orthogonal to q1q_1.

This motivates the Householder approach: instead of orthogonalizing columns by subtraction (which cancels), we triangularize AA by multiplication with orthogonal matrices, which preserves norms and avoids cancellation entirely.