QR Factorization University of Massachusetts Amherst
Why Orthogonal Matrices Are Perfect ¶ For orthogonal Q Q Q (i.e., Q T Q = I Q^TQ = I Q T Q = I ):
κ 2 ( Q ) = 1 \kappa_2(Q) = 1 κ 2 ( Q ) = 1 — perfectly conditioned
∥ Q x ∥ 2 = ∥ x ∥ 2 \|Qx\|_2 = \|x\|_2 ∥ Q x ∥ 2 = ∥ x ∥ 2 — preserves lengths
No cancellation in Q x Qx Q x — errors don’t amplify
This is why QR beats LU for least squares, and why Householder beats Gram-Schmidt.
When we multiply by an orthogonal matrix, rounding errors stay the same size—they don’t grow. This makes orthogonal transformations the tool of choice for stable numerical linear algebra.
Why Not Gram-Schmidt? ¶ Gram-Schmidt—even the modified version—is not the method of choice for computing QR in practice. The fundamental issue is that it works by subtracting projections, which leads to cancellation errors.
We need a different strategy: instead of orthogonalizing columns by subtraction, we’ll triangularize A A A by multiplication with orthogonal matrices:
Q n ⋯ Q 2 Q 1 A = R ⇒ A = Q 1 T Q 2 T ⋯ Q n T ⏟ Q R Q_n \cdots Q_2 Q_1 A = R \quad \Rightarrow \quad A = \underbrace{Q_1^T Q_2^T \cdots Q_n^T}_{Q} R Q n ⋯ Q 2 Q 1 A = R ⇒ A = Q Q 1 T Q 2 T ⋯ Q n T R Since orthogonal matrices preserve norms and don’t amplify errors, this approach is backward stable .
The QR Factorization ¶ Householder Reflections ¶ The stable way to compute QR uses Householder reflections —orthogonal matrices that reflect vectors across hyperplanes.
Given a unit vector v v v , the Householder reflector is:
H = I − 2 v v T H = I - 2vv^T H = I − 2 v v T This matrix reflects any vector across the hyperplane perpendicular to v v v .
Key properties:
H H H is symmetric: H T = H H^T = H H T = H
H H H is orthogonal: H T H = I H^T H = I H T H = I
H 2 = I H^2 = I H 2 = I (applying twice returns to original)
H x Hx H x reflects x x x across the hyperplane { y : v T y = 0 } \{y : v^T y = 0\} { y : v T y = 0 }
The Householder QR Algorithm ¶ The idea: introduce zeros below the diagonal, column by column, using Householder reflections.
Step 1: Find a Householder reflector H 1 H_1 H 1 that zeros out A 2 : m , 1 A_{2:m,1} A 2 : m , 1 :
H 1 A = ( ∗ ∗ ⋯ ∗ 0 ∗ ⋯ ∗ 0 ∗ ⋯ ∗ ⋮ ⋮ ⋮ 0 ∗ ⋯ ∗ ) H_1 A = \begin{pmatrix} * & * & \cdots & * \\ 0 & * & \cdots & * \\ 0 & * & \cdots & * \\ \vdots & \vdots & & \vdots \\ 0 & * & \cdots & * \end{pmatrix} H 1 A = ⎝ ⎛ ∗ 0 0 ⋮ 0 ∗ ∗ ∗ ⋮ ∗ ⋯ ⋯ ⋯ ⋯ ∗ ∗ ∗ ⋮ ∗ ⎠ ⎞ Step 2: Find H 2 H_2 H 2 that zeros out ( H 1 A ) 3 : m , 2 (H_1A)_{3:m,2} ( H 1 A ) 3 : m , 2 , leaving column 1 unchanged:
H 2 H 1 A = ( ∗ ∗ ∗ ⋯ ∗ 0 ∗ ∗ ⋯ ∗ 0 0 ∗ ⋯ ∗ ⋮ ⋮ ⋮ ⋮ 0 0 ∗ ⋯ ∗ ) H_2 H_1 A = \begin{pmatrix} * & * & * & \cdots & * \\ 0 & * & * & \cdots & * \\ 0 & 0 & * & \cdots & * \\ \vdots & \vdots & \vdots & & \vdots \\ 0 & 0 & * & \cdots & * \end{pmatrix} H 2 H 1 A = ⎝ ⎛ ∗ 0 0 ⋮ 0 ∗ ∗ 0 ⋮ 0 ∗ ∗ ∗ ⋮ ∗ ⋯ ⋯ ⋯ ⋯ ∗ ∗ ∗ ⋮ ∗ ⎠ ⎞ Continue until the matrix is upper triangular:
H n ⋯ H 2 H 1 A = R H_n \cdots H_2 H_1 A = R H n ⋯ H 2 H 1 A = R Therefore:
A = H 1 H 2 ⋯ H n ⏟ Q R A = \underbrace{H_1 H_2 \cdots H_n}_{Q} R A = Q H 1 H 2 ⋯ H n R Since each H i H_i H i is orthogonal, their product Q Q Q is orthogonal.
Computing the Householder Vector ¶ To zero out everything below a 1 a_1 a 1 in a vector a a a , we need H a = ∥ a ∥ 2 e 1 Ha = \|a\|_2 e_1 H a = ∥ a ∥ 2 e 1 .
The Householder vector is:
v = a + sign ( a 1 ) ∥ a ∥ 2 e 1 v = a + \text{sign}(a_1) \|a\|_2 e_1 v = a + sign ( a 1 ) ∥ a ∥ 2 e 1 then normalize: v ← v / ∥ v ∥ 2 v \leftarrow v / \|v\|_2 v ← v /∥ v ∥ 2
Why the sign? To avoid cancellation when a a a is nearly parallel to e 1 e_1 e 1 .
Comparison: Gram-Schmidt vs Householder ¶ Aspect Gram-Schmidt Householder Approach Orthogonalize columns Introduce zeros Computes Q Q Q explicitlyQ Q Q implicitly (as product of H i H_i H i )Stability Can lose orthogonality Backward stable ∣ Q T Q − I ∣ |Q^TQ - I| ∣ Q T Q − I ∣ O ( κ ( A ) ϵ ) O(\kappa(A)\epsilon) O ( κ ( A ) ϵ ) O ( ϵ ) O(\epsilon) O ( ϵ ) Cost 2 m n 2 2mn^2 2 m n 2 2 m n 2 − 2 3 n 3 2mn^2 - \frac{2}{3}n^3 2 m n 2 − 3 2 n 3
Why Householder is Backward Stable ¶ The Householder QR algorithm is backward stable : the computed factors Q ^ , R ^ \hat{Q}, \hat{R} Q ^ , R ^ satisfy
Q ^ R ^ = A + δ A where ∥ δ A ∥ ∥ A ∥ = O ( ε mach ) \hat{Q}\hat{R} = A + \delta A \quad \text{where} \quad \frac{\|\delta A\|}{\|A\|} = O(\varepsilon_{\text{mach}}) Q ^ R ^ = A + δ A where ∥ A ∥ ∥ δ A ∥ = O ( ε mach ) Moreover, Q ^ \hat{Q} Q ^ is orthogonal to machine precision: ∥ Q ^ T Q ^ − I ∥ = O ( ε mach ) \|\hat{Q}^T\hat{Q} - I\| = O(\varepsilon_{\text{mach}}) ∥ Q ^ T Q ^ − I ∥ = O ( ε mach ) .
Proof 1
Key insight 1: Orthogonal transformations preserve norms.
Each Householder reflector H k H_k H k satisfies ∥ H k ∥ 2 = 1 \|H_k\|_2 = 1 ∥ H k ∥ 2 = 1 . When we compute H k x H_k x H k x in floating point, the error is:
fl ( H k x ) = H k x + δ k with ∥ δ k ∥ = O ( ε mach ) ∥ x ∥ \text{fl}(H_k x) = H_k x + \delta_k \quad \text{with} \quad \|\delta_k\| = O(\varepsilon_{\text{mach}}) \|x\| fl ( H k x ) = H k x + δ k with ∥ δ k ∥ = O ( ε mach ) ∥ x ∥ The error is proportional to ∥ x ∥ \|x\| ∥ x ∥ , not amplified.
Key insight 2: No cancellation.
Gram-Schmidt computes:
v j = a j − ∑ i < j ⟨ a j , q i ⟩ q i (subtraction → cancellation risk) v_j = a_j - \sum_{i<j} \langle a_j, q_i \rangle \, q_i \quad \text{(subtraction → cancellation risk)} v j = a j − i < j ∑ ⟨ a j , q i ⟩ q i (subtraction → cancellation risk) Householder computes:
H k a = ∥ a ∥ 2 e 1 (orthogonal transformation → no cancellation) H_k a = \|a\|_2 e_1 \quad \text{(orthogonal transformation → no cancellation)} H k a = ∥ a ∥ 2 e 1 (orthogonal transformation → no cancellation) The reflector introduces zeros by rotation/reflection, not by subtracting nearly equal quantities.
Key insight 3: Errors don’t accumulate badly.
After n n n Householder transformations:
H n ⋯ H 1 A = R + E with ∥ E ∥ = O ( n ε mach ) ∥ A ∥ H_n \cdots H_1 A = R + E \quad \text{with} \quad \|E\| = O(n \varepsilon_{\text{mach}}) \|A\| H n ⋯ H 1 A = R + E with ∥ E ∥ = O ( n ε mach ) ∥ A ∥ The factor of n n n (not 2 n 2^n 2 n !) comes from the fact that each orthogonal transformation only adds O ( ε mach ) O(\varepsilon_{\text{mach}}) O ( ε mach ) relative error.
Contrast with Gram-Schmidt: The loss of orthogonality in Gram-Schmidt scales as κ ( A ) ⋅ ε mach \kappa(A) \cdot \varepsilon_{\text{mach}} κ ( A ) ⋅ ε mach . For ill-conditioned matrices, this can be catastrophic. Householder QR maintains orthogonality to O ( ε mach ) O(\varepsilon_{\text{mach}}) O ( ε mach ) regardless of conditioning.
Properties of the QR Factorization ¶ Every A ∈ R m × n A \in \mathbb{R}^{m \times n} A ∈ R m × n with m ≥ n m \geq n m ≥ n has a QR factorization.
If A A A has full column rank, the reduced QR is unique (up to signs of columns of Q Q Q ) when we require R R R to have positive diagonal entries.
Applications of QR ¶ Solving least squares: More stable than normal equations
Eigenvalue algorithms: QR iteration
Orthonormal basis: For column space of A A A
Rank determination: Via the diagonal of R R R
Linear system solving: When A A A is square
Cost Analysis ¶ For A ∈ R m × n A \in \mathbb{R}^{m \times n} A ∈ R m × n :
Operation Flops QR factorization2 m n 2 − 2 3 n 3 2mn^2 - \frac{2}{3}n^3 2 m n 2 − 3 2 n 3 Compute Q Q Q explicitly Additional 2 m n 2 2mn^2 2 m n 2 Apply Q T Q^T Q T to vector 4 m n − 2 n 2 4mn - 2n^2 4 mn − 2 n 2
For least squares, we often don’t need Q Q Q explicitly—just Q T b Q^T b Q T b .