Gram-Schmidt Orthogonalization University of Massachusetts Amherst
The Gram-Schmidt algorithm transforms any basis into an orthonormal basis—and in doing so, computes the QR factorization. Just as Gaussian elimination organized as matrix operations gives LU , Gram-Schmidt organized as matrix operations gives QR .
The Problem ¶ Given linearly independent vectors a 1 , … , a n a_1, \ldots, a_n a 1 , … , a n , find an orthonormal basis q 1 , … , q n q_1, \ldots, q_n q 1 , … , q n for the same subspace.
The Algorithm ¶ The idea is simple: process vectors one at a time, subtracting off their projections onto previously computed orthonormal vectors.
Input: Linearly independent vectors a 1 , … , a n a_1, \ldots, a_n a 1 , … , a n
Output: Orthonormal vectors q 1 , … , q n q_1, \ldots, q_n q 1 , … , q n
for j = 1 , 2 , … , n j = 1, 2, \ldots, n j = 1 , 2 , … , n :
v j = a j \qquad v_j = a_j v j = a j
\qquad for i = 1 , 2 , … , j − 1 i = 1, 2, \ldots, j-1 i = 1 , 2 , … , j − 1 :
v j = v j − ⟨ a j , q i ⟩ q i \qquad\qquad v_j = v_j - \langle a_j, q_i \rangle \, q_i v j = v j − ⟨ a j , q i ⟩ q i \quad (subtract projection)
q j = v j / ∥ v j ∥ \qquad q_j = v_j / \|v_j\| q j = v j /∥ v j ∥ \quad (normalize)
Example 1 (Step-by-Step Gram-Schmidt)
Given a 1 = ( 1 1 0 ) a_1 = \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix} a 1 = ⎝ ⎛ 1 1 0 ⎠ ⎞ , a 2 = ( 1 0 1 ) a_2 = \begin{pmatrix} 1 \\ 0 \\ 1 \end{pmatrix} a 2 = ⎝ ⎛ 1 0 1 ⎠ ⎞ :
Step 1: Normalize a 1 a_1 a 1 :
q 1 = a 1 ∥ a 1 ∥ = 1 2 ( 1 1 0 ) q_1 = \frac{a_1}{\|a_1\|} = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix} q 1 = ∥ a 1 ∥ a 1 = 2 1 ⎝ ⎛ 1 1 0 ⎠ ⎞ Step 2: Compute projection coefficient:
⟨ a 2 , q 1 ⟩ = ( 1 0 1 ) ⋅ 1 2 ( 1 1 0 ) = 1 2 \langle a_2, q_1 \rangle = \begin{pmatrix} 1 \\ 0 \\ 1 \end{pmatrix} \cdot \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix} = \frac{1}{\sqrt{2}} ⟨ a 2 , q 1 ⟩ = ⎝ ⎛ 1 0 1 ⎠ ⎞ ⋅ 2 1 ⎝ ⎛ 1 1 0 ⎠ ⎞ = 2 1 Step 3: Subtract projection:
v 2 = a 2 − ⟨ a 2 , q 1 ⟩ q 1 = ( 1 0 1 ) − 1 2 ⋅ 1 2 ( 1 1 0 ) = ( 1 / 2 − 1 / 2 1 ) v_2 = a_2 - \langle a_2, q_1 \rangle \, q_1 = \begin{pmatrix} 1 \\ 0 \\ 1 \end{pmatrix} - \frac{1}{\sqrt{2}} \cdot \frac{1}{\sqrt{2}} \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix} = \begin{pmatrix} 1/2 \\ -1/2 \\ 1 \end{pmatrix} v 2 = a 2 − ⟨ a 2 , q 1 ⟩ q 1 = ⎝ ⎛ 1 0 1 ⎠ ⎞ − 2 1 ⋅ 2 1 ⎝ ⎛ 1 1 0 ⎠ ⎞ = ⎝ ⎛ 1/2 − 1/2 1 ⎠ ⎞ Step 4: Normalize:
∥ v 2 ∥ = 1 / 4 + 1 / 4 + 1 = 3 / 2 \|v_2\| = \sqrt{1/4 + 1/4 + 1} = \sqrt{3/2} ∥ v 2 ∥ = 1/4 + 1/4 + 1 = 3/2 q 2 = v 2 ∥ v 2 ∥ = 2 3 ( 1 / 2 − 1 / 2 1 ) = ( 1 / 6 − 1 / 6 2 / 6 ) q_2 = \frac{v_2}{\|v_2\|} = \sqrt{\frac{2}{3}} \begin{pmatrix} 1/2 \\ -1/2 \\ 1 \end{pmatrix} = \begin{pmatrix} 1/\sqrt{6} \\ -1/\sqrt{6} \\ 2/\sqrt{6} \end{pmatrix} q 2 = ∥ v 2 ∥ v 2 = 3 2 ⎝ ⎛ 1/2 − 1/2 1 ⎠ ⎞ = ⎝ ⎛ 1/ 6 − 1/ 6 2/ 6 ⎠ ⎞ Verify orthogonality: ⟨ q 1 , q 2 ⟩ = 1 2 ⋅ 1 6 − 1 2 ⋅ 1 6 + 0 = 0 \langle q_1, q_2 \rangle = \frac{1}{\sqrt{2}} \cdot \frac{1}{\sqrt{6}} - \frac{1}{\sqrt{2}} \cdot \frac{1}{\sqrt{6}} + 0 = 0 ⟨ q 1 , q 2 ⟩ = 2 1 ⋅ 6 1 − 2 1 ⋅ 6 1 + 0 = 0 ✓
From Gram-Schmidt to QR Factorization ¶ Gram-Schmidt is actually computing a QR factorization! Rearranging the algorithm:
a j = ∑ i = 1 j − 1 ⟨ a j , q i ⟩ q i + ∥ v j ∥ q j a_j = \sum_{i=1}^{j-1} \langle a_j, q_i \rangle \, q_i + \|v_j\| \, q_j a j = i = 1 ∑ j − 1 ⟨ a j , q i ⟩ q i + ∥ v j ∥ q j This means a j a_j a j is a linear combination of q 1 , … , q j q_1, \ldots, q_j q 1 , … , q j .
In matrix form:
( ∣ ∣ ∣ a 1 a 2 ⋯ a n ∣ ∣ ∣ ) ⏟ A = ( ∣ ∣ ∣ q 1 q 2 ⋯ q n ∣ ∣ ∣ ) ⏟ Q ( r 11 r 12 ⋯ r 1 n 0 r 22 ⋯ r 2 n ⋮ ⋱ ⋮ 0 0 ⋯ r n n ) ⏟ R \underbrace{\begin{pmatrix} | & | & & | \\ a_1 & a_2 & \cdots & a_n \\ | & | & & | \end{pmatrix}}_A = \underbrace{\begin{pmatrix} | & | & & | \\ q_1 & q_2 & \cdots & q_n \\ | & | & & | \end{pmatrix}}_Q \underbrace{\begin{pmatrix} r_{11} & r_{12} & \cdots & r_{1n} \\ 0 & r_{22} & \cdots & r_{2n} \\ \vdots & & \ddots & \vdots \\ 0 & 0 & \cdots & r_{nn} \end{pmatrix}}_R A ⎝ ⎛ ∣ a 1 ∣ ∣ a 2 ∣ ⋯ ∣ a n ∣ ⎠ ⎞ = Q ⎝ ⎛ ∣ q 1 ∣ ∣ q 2 ∣ ⋯ ∣ q n ∣ ⎠ ⎞ R ⎝ ⎛ r 11 0 ⋮ 0 r 12 r 22 0 ⋯ ⋯ ⋱ ⋯ r 1 n r 2 n ⋮ r nn ⎠ ⎞ where:
r i j = ⟨ a j , q i ⟩ r_{ij} = \langle a_j, q_i \rangle r ij = ⟨ a j , q i ⟩ for i < j i < j i < j (the projection coefficients)
r j j = ∥ v j ∥ r_{jj} = \|v_j\| r jj = ∥ v j ∥ (the normalization factor)
Remark 1 (The Parallel: Gaussian Elimination →
LU , Gram-Schmidt →
QR )
Algorithm What it does Matrix factorization Gaussian elimination Row operations to get upper triangular A = L U A = LU A = LU Gram-Schmidt Orthogonalization to get orthonormal basis A = Q R A = QR A = QR
In both cases:
The algorithm performs a sequence of operations on columns/rows
Recording these operations as matrices gives the factorization
L L L stores the elimination multipliers; R R R stores the projection coefficients
U U U is the reduced form; Q Q Q is the orthonormal basis
Making This Concrete:
Gaussian elimination: Each step subtracts a multiple of one row from another. The multipliers ℓ i j \ell_{ij} ℓ ij go into L L L ; the result is U U U .
Gram-Schmidt: Each step subtracts projections onto previous vectors. The projection coefficients r i j = ⟨ a j , q i ⟩ r_{ij} = \langle a_j, q_i \rangle r ij = ⟨ a j , q i ⟩ go into R R R ; the orthonormal vectors form Q Q Q .
The triangular structure of R R R reflects the sequential nature: a j a_j a j only projects onto q 1 , … , q j − 1 q_1, \ldots, q_{j-1} q 1 , … , q j − 1 , so r i j = 0 r_{ij} = 0 r ij = 0 for i > j i > j i > j .
Numerical Problems with Classical Gram-Schmidt ¶ Despite its elegance, classical Gram-Schmidt has serious numerical issues.
Catastrophic Cancellation ¶ Example 2 (Nearly Parallel Columns)
Consider (from Trefethen & Bau):
A = ( 0.70000 0.70711 0.70001 0.70711 ) A = \begin{pmatrix} 0.70000 & 0.70711 \\ 0.70001 & 0.70711 \end{pmatrix} A = ( 0.70000 0.70001 0.70711 0.70711 ) The columns are nearly parallel.
Why This Fails:
Computing q 2 q_2 q 2 :
q 2 = a 2 − ⟨ a 2 , q 1 ⟩ q 1 ∥ a 2 − ⟨ a 2 , q 1 ⟩ q 1 ∥ q_2 = \frac{a_2 - \langle a_2, q_1 \rangle \, q_1}{\|a_2 - \langle a_2, q_1 \rangle \, q_1\|} q 2 = ∥ a 2 − ⟨ a 2 , q 1 ⟩ q 1 ∥ a 2 − ⟨ a 2 , q 1 ⟩ q 1 The numerator involves subtracting two nearly equal vectors:
a 2 − ⟨ a 2 , q 1 ⟩ q 1 ≈ ( 0.70711 0.70711 ) − 1.00000 ( 0.70710 0.70711 ) a_2 - \langle a_2, q_1 \rangle \, q_1 \approx \begin{pmatrix} 0.70711 \\ 0.70711 \end{pmatrix} - 1.00000 \begin{pmatrix} 0.70710 \\ 0.70711 \end{pmatrix} a 2 − ⟨ a 2 , q 1 ⟩ q 1 ≈ ( 0.70711 0.70711 ) − 1.00000 ( 0.70710 0.70711 ) Cancellation! We’re subtracting numbers that agree in many digits, losing precision. The result is dominated by rounding errors.
Loss of Orthogonality ¶ In exact arithmetic, Gram-Schmidt produces perfectly orthogonal vectors. In floating-point arithmetic, the computed q ^ i \hat{q}_i q ^ i may not be orthogonal:
⟨ q ^ i , q ^ j ⟩ ≠ 0 for i ≠ j \langle \hat{q}_i, \hat{q}_j \rangle \neq 0 \quad \text{for } i \neq j ⟨ q ^ i , q ^ j ⟩ = 0 for i = j The loss of orthogonality scales with the condition number: for ill-conditioned matrices, the computed “orthonormal” vectors can be far from orthogonal. This defeats the entire purpose of orthogonalization!
Modified Gram-Schmidt ¶ A simple reordering of operations improves stability:
Input: Linearly independent vectors a 1 , … , a n a_1, \ldots, a_n a 1 , … , a n
Output: Orthonormal vectors q 1 , … , q n q_1, \ldots, q_n q 1 , … , q n and upper triangular R R R
for j = 1 , 2 , … , n j = 1, 2, \ldots, n j = 1 , 2 , … , n :
v j = a j \qquad v_j = a_j v j = a j
\qquad for i = 1 , 2 , … , j − 1 i = 1, 2, \ldots, j-1 i = 1 , 2 , … , j − 1 :
r i j = ⟨ v j , q i ⟩ \qquad\qquad r_{ij} = \langle v_j, q_i \rangle r ij = ⟨ v j , q i ⟩ \quad (use current v j v_j v j , not original a j a_j a j )
v j = v j − r i j q i \qquad\qquad v_j = v_j - r_{ij} \, q_i v j = v j − r ij q i
r j j = ∥ v j ∥ \qquad r_{jj} = \|v_j\| r jj = ∥ v j ∥
q j = v j / r j j \qquad q_j = v_j / r_{jj} q j = v j / r jj
Why Is Modified Gram-Schmidt Better? ¶ The key difference is when we compute the projection coefficients:
Classical Modified r i j = ⟨ a j , q i ⟩ r_{ij} = \langle a_j, q_i \rangle r ij = ⟨ a j , q i ⟩ r i j = ⟨ v j , q i ⟩ r_{ij} = \langle v_j, q_i \rangle r ij = ⟨ v j , q i ⟩ Uses original a j a_j a j Uses current v j v_j v j (already partially orthogonalized)
Remark 2 (The Improvement in Modified GS)
In classical Gram-Schmidt, we compute all projections using the original vector a j a_j a j , then subtract them all at once. Rounding errors in the projections accumulate.
In modified Gram-Schmidt, we subtract each projection immediately, then compute the next projection using the updated vector. Each projection is computed against a vector that is already more orthogonal to the previous q i q_i q i ’s.
Analogy: Think of it like this: if you’re trying to make a vector orthogonal to q 1 q_1 q 1 and q 2 q_2 q 2 , classical GS computes both corrections using the original vector and subtracts them together. Modified GS first makes the vector orthogonal to q 1 q_1 q 1 , then computes how to make that result orthogonal to q 2 q_2 q 2 .
The second approach is more accurate because the second projection starts from a better approximation.
However: Modified Gram-Schmidt still has orthogonality loss proportional to κ ( A ) ⋅ ε mach \kappa(A) \cdot \varepsilon_{\text{mach}} κ ( A ) ⋅ ε mach . For ill-conditioned matrices, this can still be unacceptable. We need a different approach.