Gram-Schmidt Orthogonalization - Introduction to Scientific Computing

The Problem¶

Given linearly independent vectors $a_1, \ldots, a_n$ , find an orthonormal basis $q_1, \ldots, q_n$ for the same subspace.

The Algorithm¶

The idea is simple: process vectors one at a time, subtracting off their projections onto previously computed orthonormal vectors.

Example 1 (Step-by-Step Gram-Schmidt)

Given $a_1 = \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix}$ , $a_2 = \begin{pmatrix} 1 \\ 0 \\ 1 \end{pmatrix}$ :

Step 1: Normalize $a_1$ :

q_1 = \frac{a_1}{\|a_1\|} = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix}

(1)

Step 2: Compute projection coefficient:

\langle a_2, q_1 \rangle = \begin{pmatrix} 1 \\ 0 \\ 1 \end{pmatrix} \cdot \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix} = \frac{1}{\sqrt{2}}

(2)

Step 3: Subtract projection:

v_2 = a_2 - \langle a_2, q_1 \rangle \, q_1 = \begin{pmatrix} 1 \\ 0 \\ 1 \end{pmatrix} - \frac{1}{\sqrt{2}} \cdot \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix} = \begin{pmatrix} 1/2 \\ -1/2 \\ 1 \end{pmatrix}

(3)

Step 4: Normalize:

\|v_2\| = \sqrt{1/4 + 1/4 + 1} = \sqrt{3/2}

(4)

q_2 = \frac{v_2}{\|v_2\|} = \sqrt{\frac{2}{3}} \begin{pmatrix} 1/2 \\ -1/2 \\ 1 \end{pmatrix} = \begin{pmatrix} 1/\sqrt{6} \\ -1/\sqrt{6} \\ 2/\sqrt{6} \end{pmatrix}

(5)

Verify orthogonality: $\langle q_1, q_2 \rangle = \frac{1}{\sqrt{2}} \cdot \frac{1}{\sqrt{6}} - \frac{1}{\sqrt{2}} \cdot \frac{1}{\sqrt{6}} + 0 = 0$ ✓

From Gram-Schmidt to QR Factorization¶

Gram-Schmidt is actually computing a QR factorization! Rearranging the algorithm:

a_j = \sum_{i=1}^{j-1} \langle a_j, q_i \rangle \, q_i + \|v_j\| \, q_j

(6)

This means $a_j$ is a linear combination of $q_1, \ldots, q_j$ .

In matrix form:

\underbrace{\begin{pmatrix} | & | & & | \\ a_1 & a_2 & \cdots & a_n \\ | & | & & | \end{pmatrix}}_A = \underbrace{\begin{pmatrix} | & | & & | \\ q_1 & q_2 & \cdots & q_n \\ | & | & & | \end{pmatrix}}_Q \underbrace{\begin{pmatrix} r_{11} & r_{12} & \cdots & r_{1n} \\ 0 & r_{22} & \cdots & r_{2n} \\ \vdots & & \ddots & \vdots \\ 0 & 0 & \cdots & r_{nn} \end{pmatrix}}_R

(7)

where:

$r_{ij} = \langle a_j, q_i \rangle$ for $i < j$ (the projection coefficients)
$r_{jj} = \|v_j\|$ (the normalization factor)

Remark 1 (The Parallel: Gaussian Elimination → LU, Gram-Schmidt → QR)

Algorithm	What it does	Matrix factorization
Gaussian elimination	Row operations to get upper triangular	$A = LU$
Gram-Schmidt	Orthogonalization to get orthonormal basis	$A = QR$

In both cases:

The algorithm performs a sequence of operations on columns/rows
Recording these operations as matrices gives the factorization
$L$ stores the elimination multipliers; $R$ stores the projection coefficients
$U$ is the reduced form; $Q$ is the orthonormal basis

Making This Concrete:

Gaussian elimination: Each step subtracts a multiple of one row from another. The multipliers $\ell_{ij}$ go into $L$ ; the result is $U$ .

Gram-Schmidt: Each step subtracts projections onto previous vectors. The projection coefficients $r_{ij} = \langle a_j, q_i \rangle$ go into $R$ ; the orthonormal vectors form $Q$ .

The triangular structure of $R$ reflects the sequential nature: $a_j$ only projects onto $q_1, \ldots, q_{j-1}$ , so $r_{ij} = 0$ for $i > j$ .

Numerical Problems with Classical Gram-Schmidt¶

Despite its elegance, classical Gram-Schmidt has serious numerical issues.

Catastrophic Cancellation¶

Example 2 (Nearly Parallel Columns)

Consider (from Trefethen & Bau):

A = \begin{pmatrix} 0.70000 & 0.70711 \\ 0.70001 & 0.70711 \end{pmatrix}

(8)

The columns are nearly parallel.

Why This Fails:

Computing $q_2$ :

q_2 = \frac{a_2 - \langle a_2, q_1 \rangle \, q_1}{\|a_2 - \langle a_2, q_1 \rangle \, q_1\|}

(9)

The numerator involves subtracting two nearly equal vectors:

a_2 - \langle a_2, q_1 \rangle \, q_1 \approx \begin{pmatrix} 0.70711 \\ 0.70711 \end{pmatrix} - 1.00000 \begin{pmatrix} 0.70710 \\ 0.70711 \end{pmatrix}

(10)

Cancellation! We’re subtracting numbers that agree in many digits, losing precision. The result is dominated by rounding errors.

Loss of Orthogonality¶

In exact arithmetic, Gram-Schmidt produces perfectly orthogonal vectors. In floating-point arithmetic, the computed $\hat{q}_i$ may not be orthogonal:

\langle \hat{q}_i, \hat{q}_j \rangle \neq 0 \quad \text{for } i \neq j

(11)

The loss of orthogonality scales with the condition number: for ill-conditioned matrices, the computed “orthonormal” vectors can be far from orthogonal. This defeats the entire purpose of orthogonalization!

Modified Gram-Schmidt¶

A simple reordering of operations improves stability:

Why Is Modified Gram-Schmidt Better?¶

The key difference is when we compute the projection coefficients:

Classical	Modified
$r_{ij} = \langle a_j, q_i \rangle$	$r_{ij} = \langle v_j, q_i \rangle$
Uses original $a_j$	Uses current $v_j$ (already partially orthogonalized)

Remark 2 (The Improvement in Modified GS)

In classical Gram-Schmidt, we compute all projections using the original vector $a_j$ , then subtract them all at once. Rounding errors in the projections accumulate.

In modified Gram-Schmidt, we subtract each projection immediately, then compute the next projection using the updated vector. Each projection is computed against a vector that is already more orthogonal to the previous $q_i$ ’s.

Analogy: Think of it like this: if you’re trying to make a vector orthogonal to $q_1$ and $q_2$ , classical GS computes both corrections using the original vector and subtracts them together. Modified GS first makes the vector orthogonal to $q_1$ , then computes how to make that result orthogonal to $q_2$ .

The second approach is more accurate because the second projection starts from a better approximation.

However: Modified Gram-Schmidt still has orthogonality loss proportional to $\kappa(A) \cdot \varepsilon_{\text{mach}}$ . For ill-conditioned matrices, this can still be unacceptable. We need a different approach.