Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Big Idea

The QR factorization decomposes any matrix into an orthogonal matrix times an upper triangular matrix. Householder reflections provide a numerically stable way to compute it, the gold standard for numerical linear algebra.

The QR Factorization

Definition 1 (QR Factorization)

For ARm×nA \in \mathbb{R}^{m \times n} with mnm \geq n, the QR factorization is:

A=QRA = QR

where QRm×mQ \in \mathbb{R}^{m \times m} is orthogonal (QTQ=IQ^T Q = I) and RRm×nR \in \mathbb{R}^{m \times n} is upper triangular.

Theorem 1 (Existence and Uniqueness of QR)

Every ARm×nA \in \mathbb{R}^{m \times n} with mnm \geq n has a QR factorization.

If AA has full column rank, the reduced QR is unique (up to signs of columns of QQ) when we require RR to have positive diagonal entries.

Proof 1

The Gram-Schmidt algorithm constructs an orthonormal basis q1,,qnq_1, \ldots, q_n for the column space of AA and expresses each column aja_j as a linear combination of q1,,qjq_1, \ldots, q_j:

aj=i=1jrijqia_j = \sum_{i=1}^{j} r_{ij} q_i

where rij=aj,qir_{ij} = \langle a_j, q_i \rangle for i<ji < j and rjj=vjr_{jj} = \|v_j\|. In matrix form this is A=QRA = QR with QQ orthogonal and RR upper triangular.

Uniqueness: if A=Q1R1=Q2R2A = Q_1 R_1 = Q_2 R_2 with positive diagonals, then Q2TQ1=R2R11Q_2^T Q_1 = R_2 R_1^{-1} is both orthogonal and upper triangular, hence a diagonal matrix with ±1\pm 1 entries. The positive diagonal constraint forces it to be II.

Computing QR: Householder Reflections

Gram-Schmidt constructs QR by orthogonalizing columns via subtraction, which is numerically unstable (see loss of orthogonality). The stable alternative is the same idea as column elimination in Gaussian elimination, with one substitution: instead of eliminating below the diagonal by subtracting multiples of rows (recorded as triangular factors LkL_k), we eliminate by applying orthogonal reflectors HkH_k that zero entries below the diagonal of each column. The triangular factors of GE are cheap but can amplify errors; the orthogonal reflectors of Householder preserve norms and do not.

Concretely, we triangularize AA by multiplying with orthogonal matrices:

QnQ2Q1A=RA=Q1TQ2TQnTQRQ_n \cdots Q_2 Q_1 A = R \quad \Rightarrow \quad A = \underbrace{Q_1^T Q_2^T \cdots Q_n^T}_{Q} R

Each QkQ_k is a Householder reflector that zeros out entries below the diagonal in column kk.

Householder Reflectors

Definition 2 (Householder Reflector)

Given a unit vector vv, the Householder reflector is:

H=I2vvTH = I - 2vv^T

This matrix reflects any vector across the hyperplane perpendicular to vv.

Proposition 1 (Properties of Householder Reflectors)

A Householder reflector H=I2vvTH = I - 2vv^T satisfies:

  1. HH is symmetric: HT=HH^T = H

  2. HH is orthogonal: HTH=IH^T H = I

  3. HH is an involution: H2=IH^2 = I

  4. HxHx reflects xx across the hyperplane {y:vTy=0}\{y : v^T y = 0\}

Proof 2

(1) (I2vvT)T=I2vvT(I - 2vv^T)^T = I - 2vv^T since (vvT)T=vvT(vv^T)^T = vv^T.

(2) HTH=H2=(I2vvT)(I2vvT)=I4vvT+4v(vTv)vT=IH^TH = H^2 = (I - 2vv^T)(I - 2vv^T) = I - 4vv^T + 4v(v^Tv)v^T = I since vTv=1v^Tv = 1.

(3) Follows from (2) and HT=HH^T = H.

(4) Decompose x=(vTx)v+(x(vTx)v)x = (v^Tx)v + (x - (v^Tx)v). The first term is the component along vv, the second is in the hyperplane. Then Hx=(vTx)v+(x(vTx)v)Hx = -(v^Tx)v + (x - (v^Tx)v): the component along vv is negated.

Source
<Figure size 1200x500 with 2 Axes>

Computing the Householder Vector

To zero out everything below the first entry of a vector aa, we need Ha=a2e1Ha = \|a\|_2 e_1. Since HH is a reflection and preserves norms, the target a2e1\|a\|_2 e_1 lies on the same sphere as aa (right panel above). The Householder vector vv is the bisector of aa and the target:

v=a+sign(a1)a2e1,vv/v2v = a + \text{sign}(a_1) \|a\|_2 e_1, \qquad v \leftarrow v / \|v\|_2

The sign choice avoids cancellation when aa is nearly parallel to e1e_1.

The Algorithm

Algorithm 1 (Householder QR)

Input: ARm×nA \in \mathbb{R}^{m \times n}

Output: Orthogonal QQ and upper triangular RR with A=QRA = QR

  1. for k=1,2,,nk = 1, 2, \ldots, n:

  2. \qquad Let a=Ak:m,ka = A_{k:m, k} (the subcolumn below the diagonal)

  3. \qquad v=a+sign(a1)a2e1v = a + \text{sign}(a_1)\|a\|_2 e_1, vv/v\quad v \leftarrow v/\|v\|

  4. \qquad Ak:m,k:nAk:m,k:n2v(vTAk:m,k:n)A_{k:m, k:n} \leftarrow A_{k:m, k:n} - 2v(v^T A_{k:m, k:n})

  5. Q=H1H2HnQ = H_1 H_2 \cdots H_n

Note that step 4 applies the reflection only to the submatrix, and the matrix-vector product vTAk:m,k:nv^T A_{k:m,k:n} avoids forming HH explicitly.

For ARm×nA \in \mathbb{R}^{m \times n}, the factorization costs 2mn223n32mn^2 - \tfrac{2}{3}n^3 flops; forming QQ explicitly costs an additional 2mn22mn^2, while applying QTQ^T to a single vector costs only 4mn2n24mn - 2n^2. In the square case m=nm = n, the factorization cost reduces to 43n3\tfrac{4}{3}n^3 flops, twice the cost of LU factorization. For least squares we usually do not need QQ explicitly, just QTbQ^T b, which can be computed by applying the Householder reflectors sequentially.

Numerical Stability

We developed the framework of forward and backward stability in the previous chapter. Recall that forward stability is unattainable (Proposition 2), and the right standard is backward stability: the computed result is the exact result for a slightly perturbed input. Householder QR achieves this.

What does this mean for a factorization? The algorithm takes AA and returns computed factors Q^,R^\hat{Q}, \hat{R}. These will not satisfy Q^R^=A\hat{Q}\hat{R} = A exactly. Backward stability asks for the next best thing: there exists a small perturbation δA\delta A such that Q^R^=A+δA\hat{Q}\hat{R} = A + \delta A exactly. In other words, the computed factors are the exact QR factorization of a nearby matrix. The size of δA\delta A is the backward error, and we want δA/A=O(εmach)\|\delta A\|/\|A\| = O(\varepsilon_{\text{mach}}).

Theorem 2 (Backward Stability of Householder QR)

The computed factors Q^,R^\hat{Q}, \hat{R} satisfy:

Q^R^=A+δAwhereδAA=O(εmach)\hat{Q}\hat{R} = A + \delta A \quad \text{where} \quad \frac{\|\delta A\|}{\|A\|} = O(\varepsilon_{\text{mach}})

Moreover, Q^\hat{Q} is orthogonal to machine precision: Q^TQ^I=O(εmach)\|\hat{Q}^T\hat{Q} - I\| = O(\varepsilon_{\text{mach}}).

Proof 3

(Sketch, following Higham, Accuracy and Stability of Numerical Algorithms, Ch. 19.)

The full proof tracks rounding errors through each Householder step. We outline the three key ingredients and how they combine.

Step 1: One Householder reflection is backward stable.

At step kk, we apply a Householder reflector Hk=I2vkvkTH_k = I - 2v_kv_k^T to the current matrix. The key operation is HkBH_k B for a submatrix BB. In floating point, the computed result satisfies

fl(HkB)=(Hk+δHk)B,δHk=O(εmach)\text{fl}(H_k B) = (H_k + \delta H_k) B, \quad \|\delta H_k\| = O(\varepsilon_{\text{mach}})

To see why δHk\delta H_k is small, look at how HkB=B2vk(vkTB)H_k B = B - 2 v_k (v_k^T B) is actually computed. The two intermediate quantities are the row vector wT=vkTBw^T = v_k^T B and the rank-one update 2vkwT2 v_k w^T. Apply the standard floating-point error model:

  • wTw^T is a row of dot products. Each computed entry w^j\hat{w}_j satisfies w^jwjcεvk2Bej2|\hat{w}_j - w_j| \le c\,\varepsilon\, \|v_k\|_2\, \|B e_j\|_2 for a small constant cc. Since vk2=1\|v_k\|_2 = 1, this is O(ε)B2O(\varepsilon)\,\|B\|_2.

  • The subtraction B2vkw^TB - 2 v_k \hat{w}^T adds another O(ε)B2O(\varepsilon)\,\|B\|_2 per entry from the multiply-add.

Collecting both, the computed result equals (Hk+δHk)B(H_k + \delta H_k) B with δHk2=O(ε)\|\delta H_k\|_2 = O(\varepsilon). The constant does not depend on B\|B\| because vk2=1\|v_k\|_2 = 1 and HkH_k does not stretch any direction.

Contrast this with Gaussian elimination. The elementary matrix Lk=I+kekTL_k = I + \ell_k e_k^T contains the multipliers k=aik/akk\ell_k = a_{ik}/a_{kk}, which can be arbitrarily large when akk|a_{kk}| is small. The analogous bound there carries a Lk\|L_k\| factor, which is exactly the growth factor that pivoting tries to control. Orthogonal HkH_k has Hk2=1\|H_k\|_2 = 1 unconditionally, so no such amplification appears.

Step 2: The perturbations compose additively, not multiplicatively.

After nn steps, the computed upper triangular factor R^\hat{R} satisfies

R^=(Hn+δHn)(H1+δH1)A.\hat{R} = (H_n + \delta H_n) \cdots (H_1 + \delta H_1) A.

We claim the perturbed product equals QT+EQ^T + E with E2=O(nεmach)\|E\|_2 = O(n\varepsilon_{\text{mach}}), where QT=HnH1Q^T = H_n \cdots H_1. The key point is the nn, not 2n2^n.

Start with two factors. Writing H^k=Hk+δHk\hat{H}_k = H_k + \delta H_k with δHk2=O(ε)\|\delta H_k\|_2 = O(\varepsilon),

H^2H^1=H2H1+H2δH1+δH2H1+δH2δH1.\hat{H}_2 \hat{H}_1 = H_2 H_1 + H_2\, \delta H_1 + \delta H_2\, H_1 + \delta H_2\, \delta H_1.

The last term is O(ε2)O(\varepsilon^2) and we drop it. The two first-order terms each contain a single δHk\delta H_k flanked by orthogonal factors. Since Hk2=1\|H_k\|_2 = 1, the flanking does not change the size: H2δH12=δH12=O(ε)\|H_2\, \delta H_1\|_2 = \|\delta H_1\|_2 = O(\varepsilon), and likewise for the other term. The total error is bounded by δH12+δH22=O(ε)\|\delta H_1\|_2 + \|\delta H_2\|_2 = O(\varepsilon), the sum of the factor errors rather than a product.

If H1,H2H_1, H_2 were not orthogonal the sandwich H2δH1H_2\, \delta H_1 would scale by H22\|H_2\|_2, and with nn factors the analogous bound carries a Hj2\prod \|H_j\|_2 amplification. Orthogonality is exactly what kills this multiplicative blow-up.

Extending to nn factors gives the same conclusion: each of the nn first-order terms is a single δHk\delta H_k flanked by orthogonal matrices, contributing O(ε)O(\varepsilon), so the total is O(nε)O(n\varepsilon).

Step 3: Rewriting as a backward error on AA.

The computed Q^\hat{Q} is implicitly defined as the inverse of the product of computed reflectors: Q^T:=H^nH^1=QT+E\hat{Q}^T := \hat{H}_n \cdots \hat{H}_1 = Q^T + E. Transposing, Q^=Q+ET\hat{Q} = Q + E^T. Substituting into Q^R^\hat{Q}\hat{R} and using R^=Q^TA=(QT+E)A\hat{R} = \hat{Q}^T A = (Q^T + E)A from Step 2,

Q^R^=(Q+ET)(QT+E)A=A+(QE+ETQT)A+O(ε2),\hat{Q}\hat{R} = (Q + E^T)(Q^T + E) A = A + (QE + E^T Q^T) A + O(\varepsilon^2),

where we used QQT=IQQ^T = I. Setting δA=(QE+ETQT)A\delta A = (QE + E^T Q^T) A, the relative backward error satisfies

δAA2Q2E2=O(nεmach),\frac{\|\delta A\|}{\|A\|} \leq 2\|Q\|_2\, \|E\|_2 = O(n\varepsilon_{\text{mach}}),

since Q2=1\|Q\|_2 = 1.

Takeaway. The proof works because orthogonal matrices have three properties that prevent error growth: unit norm (Q2=1\|Q\|_2 = 1, no amplification), norm preservation (Qx2=x2\|Qx\|_2 = \|x\|_2, no cancellation), and unit condition number (κ2(Q)=1\kappa_2(Q) = 1, perturbations stay small under composition).

The orthogonality of the computed Q^\hat{Q} is the key difference from Gram-Schmidt. For Gram-Schmidt, the loss of orthogonality is Q^TQ^I=O(κ(A)εmach)\|\hat{Q}^T\hat{Q} - I\| = O(\kappa(A) \cdot \varepsilon_{\text{mach}}), which can be catastrophic for ill-conditioned matrices. Householder maintains O(εmach)O(\varepsilon_{\text{mach}}) regardless of conditioning.

Why Orthogonal Transformations Are Ideal

The stability of Householder QR is not accidental. It reflects a deeper principle: errors under composition of orthogonal matrices add, they do not multiply.

For a single orthogonal QQ this is obvious. If xx carries an error ee, then

Q(x+e)Qx2=Qe2=e2.\|Q(x+e) - Qx\|_2 = \|Qe\|_2 = \|e\|_2.

The error is transported, never magnified. The composition property is the consequence we cashed in during Step 2 of Theorem 2. The perturbed product expands to

(Hn+δHn)(H1+δH1)=HnH1+k=1nHnHk+1δHkHk1H1+O(ε2).(H_n + \delta H_n) \cdots (H_1 + \delta H_1) = H_n \cdots H_1 + \sum_{k=1}^{n} H_n \cdots H_{k+1}\, \delta H_k\, H_{k-1} \cdots H_1 + O(\varepsilon^2).

For orthogonal HkH_k each summand has norm δHk2\|\delta H_k\|_2 because the flanking factors have unit norm, so the total first-order error is bounded by

k=1nδHk2=O(nε).\sum_{k=1}^{n} \|\delta H_k\|_2 = O(n\varepsilon).

For general invertible factors MkM_k the same expansion holds, but the sandwich inequality gives

MnMk+1δMkMk1M12δMk2jkMj2,\| M_n \cdots M_{k+1}\, \delta M_k\, M_{k-1} \cdots M_1 \|_2 \leq \|\delta M_k\|_2 \prod_{j \neq k} \|M_j\|_2,

so the total error is bounded by

k=1nδMk2jkMj2=O ⁣(εj=1nMj2),\sum_{k=1}^{n} \|\delta M_k\|_2 \prod_{j \neq k} \|M_j\|_2 = O\!\left( \varepsilon \prod_{j=1}^{n} \|M_j\|_2 \right),

multiplicative in the factor norms. If each Mj2=M2\|M_j\|_2 = \|M\|_2, this is O(εM2n)O(\varepsilon \|M\|_2^n), the exponential blow-up that orthogonality avoids.

This is why orthogonal transformations are the tool of choice throughout numerical linear algebra, from QR factorization to eigenvalue algorithms: they are the only family of matrices for which long sequences of operations do not amplify rounding errors.

Solving Linear Systems with QR

Given a square system Ax=bA\mathbf{x} = \mathbf{b} with ARn×nA \in \mathbb{R}^{n \times n} invertible, the QR factorization reduces it to a triangular solve:

Ax=bQRx=bRx=QTbA\mathbf{x} = \mathbf{b} \quad \Rightarrow \quad QR\mathbf{x} = \mathbf{b} \quad \Rightarrow \quad R\mathbf{x} = Q^T\mathbf{b}

The steps are:

  1. Factor: compute A=QRA = QR via Householder (43n3\frac{4}{3}n^3 flops for square AA)

  2. Transform: compute b~=QTb\tilde{\mathbf{b}} = Q^T\mathbf{b} (O(n2)\mathcal{O}(n^2) flops)

  3. Solve: back substitution on Rx=b~R\mathbf{x} = \tilde{\mathbf{b}} (O(n2)\mathcal{O}(n^2) flops)

Example 1 (Solving a Linear System with QR)

Consider:

A=(1111),b=(31)A = \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}, \quad \mathbf{b} = \begin{pmatrix} 3 \\ 1 \end{pmatrix}

Step 1: QR factorization. The first column has norm a1=2\|a_1\| = \sqrt{2}. The Householder vector maps a1a_1 to 2e1\sqrt{2}\,e_1. After applying the reflector to the full matrix:

Q=12(1111),R=(2002)Q = \frac{1}{\sqrt{2}}\begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}, \quad R = \begin{pmatrix} \sqrt{2} & 0 \\ 0 & \sqrt{2} \end{pmatrix}

(In this case AA is already orthogonal up to scaling.)

Step 2: Transform the right-hand side.

b~=QTb=12(1111)(31)=(4/22/2)=(222)\tilde{\mathbf{b}} = Q^T\mathbf{b} = \frac{1}{\sqrt{2}}\begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}\begin{pmatrix} 3 \\ 1 \end{pmatrix} = \begin{pmatrix} 4/\sqrt{2} \\ 2/\sqrt{2} \end{pmatrix} = \begin{pmatrix} 2\sqrt{2} \\ \sqrt{2} \end{pmatrix}

Step 3: Back substitution.

Rx=b~    (2002)(x1x2)=(222)    x=(21)R\mathbf{x} = \tilde{\mathbf{b}} \implies \begin{pmatrix} \sqrt{2} & 0 \\ 0 & \sqrt{2} \end{pmatrix}\begin{pmatrix} x_1 \\ x_2 \end{pmatrix} = \begin{pmatrix} 2\sqrt{2} \\ \sqrt{2} \end{pmatrix} \implies \mathbf{x} = \begin{pmatrix} 2 \\ 1 \end{pmatrix}

Compared to LU, the QR approach costs roughly twice as many flops (43n3\frac{4}{3}n^3 for QR vs 23n3\frac{2}{3}n^3 for LU). For square systems where stability is not a concern, LU is preferred. The real advantage of QR appears for least squares problems, where the system is overdetermined and LU does not apply.