Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

The Problem with Normal Equations

Recall the normal equations: ATAx^=ATbA^T A \hat{x} = A^T b

The issue: κ(ATA)=κ(A)2\kappa(A^T A) = \kappa(A)^2

If κ(A)=108\kappa(A) = 10^8, then κ(ATA)=1016\kappa(A^T A) = 10^{16}—near machine precision!

Reduced vs Full QR

For least squares, we need to distinguish two forms of the QR factorization:

Remark 1 (Full vs Reduced QR)

Full QR: A=QRA = QR where QRm×mQ \in \mathbb{R}^{m \times m}, RRm×nR \in \mathbb{R}^{m \times n}

Am×n=Qm×m(R10)m×n\underbrace{A}_{m \times n} = \underbrace{Q}_{m \times m} \underbrace{\begin{pmatrix} R_1 \\ 0 \end{pmatrix}}_{m \times n}

Reduced (thin) QR: A=Q^R^A = \hat{Q}\hat{R} where Q^Rm×n\hat{Q} \in \mathbb{R}^{m \times n}, R^Rn×n\hat{R} \in \mathbb{R}^{n \times n}

Am×n=Q^m×nR^n×n\underbrace{A}_{m \times n} = \underbrace{\hat{Q}}_{m \times n} \underbrace{\hat{R}}_{n \times n}

Which to Use?

  • Reduced QR is more memory-efficient (stores only mnmn elements for Q^\hat{Q} vs m2m^2 for QQ)

  • Reduced QR is sufficient for solving the least squares problem

  • Full QR is useful when you need to compute the residual norm or analyze the orthogonal complement

The QR Solution

Given the reduced QR factorization A=Q^R^A = \hat{Q}\hat{R}:

Ax^=bQ^R^x^=bA\hat{x} = b \quad \Rightarrow \quad \hat{Q}\hat{R}\hat{x} = b

The least squares solution satisfies Ax^=projR(A)b=Q^Q^TbA\hat{x} = \text{proj}_{\text{R}(A)} b = \hat{Q}\hat{Q}^T b.

Therefore:

Q^R^x^=Q^Q^Tb\hat{Q}\hat{R}\hat{x} = \hat{Q}\hat{Q}^T b

Multiply both sides by Q^T\hat{Q}^T:

R^x^=Q^Tb\hat{R}\hat{x} = \hat{Q}^T b

Why This Works

Step 1: Project bb onto the column space of AA:

projR(A)b=Q^Q^Tb\text{proj}_{\text{R}(A)} b = \hat{Q}\hat{Q}^T b

Step 2: Since R(Q^)=R(A)\text{R}(\hat{Q}) = \text{R}(A), there exists x^\hat{x} with:

Ax^=Q^R^x^=Q^Q^TbA\hat{x} = \hat{Q}\hat{R}\hat{x} = \hat{Q}\hat{Q}^T b

Step 3: Multiply by Q^T\hat{Q}^T (using Q^TQ^=I\hat{Q}^T\hat{Q} = I):

R^x^=Q^Tb\hat{R}\hat{x} = \hat{Q}^T b

Algorithm

import numpy as np
from scipy.linalg import solve_triangular

def least_squares_qr(A, b):
    """Solve least squares via QR factorization."""
    Q, R = np.linalg.qr(A, mode='reduced')
    c = Q.T @ b
    x = solve_triangular(R, c, lower=False)
    return x

Steps:

  1. Compute Q^,R^=qr(A)\hat{Q}, \hat{R} = \text{qr}(A)

  2. Compute c=Q^Tbc = \hat{Q}^T b

  3. Solve R^x^=c\hat{R}\hat{x} = c by back substitution

Condition Number Comparison

MethodEffective Condition Number
Normal equationsκ(A)2\kappa(A)^2
QR factorizationκ(A)\kappa(A)

The QR method never squares the condition number!

Dramatic Example

This example from Trefethen and Bau shows the difference:

import numpy as np

def fake_regression():
    """Ill-conditioned polynomial regression problem."""
    N = 100
    k = 14
    t = np.linspace(0, 1, N)
    X = np.column_stack([t**i for i in range(k+1)])
    y = np.exp(np.sin(4*t))
    y = y / 2006.787453080206  # normalize
    return X, y

X, y = fake_regression()

# Method 1: Normal equations (BAD!)
XtX = X.T @ X
Xty = X.T @ y
beta_normal = np.linalg.solve(XtX, Xty)

# Method 2: QR factorization (GOOD!)
Q, R = np.linalg.qr(X, mode='reduced')
beta_qr = np.linalg.solve(R, Q.T @ y)

print(f"Normal equations: beta[14] = {beta_normal[14]}")
print(f"QR factorization: beta[14] = {beta_qr[14]}")
print(f"True value:       beta[14] ≈ 1.0")

Typical output:

Normal equations: beta[14] = 0.3847...  (WRONG!)
QR factorization: beta[14] = 0.9999993... (correct to 7 figures)

Why Such a Big Difference?

For this problem:

With machine epsilon ϵ1016\epsilon \approx 10^{-16}:

MethodExpected relative error
Normal equationsκ(XTX)ϵ102\kappa(X^T X) \cdot \epsilon \approx 10^{2} (useless!)
QR factorizationκ(X)ϵ107\kappa(X) \cdot \epsilon \approx 10^{-7} (7 digits)

The Residual

After solving, the residual is:

r=bAx^=bQ^Q^Tb=(IQ^Q^T)br = b - A\hat{x} = b - \hat{Q}\hat{Q}^T b = (I - \hat{Q}\hat{Q}^T)b

The residual norm squared (sum of squared errors):

r22=b22Q^Tb22\|r\|_2^2 = \|b\|_2^2 - \|\hat{Q}^T b\|_2^2

Full QR Viewpoint

Using the full QR, A=QRA = QR where QRm×mQ \in \mathbb{R}^{m \times m}:

QTb=(c1c2)Q^T b = \begin{pmatrix} c_1 \\ c_2 \end{pmatrix}

where c1Rnc_1 \in \mathbb{R}^n and c2Rmnc_2 \in \mathbb{R}^{m-n}.

The least squares solution satisfies R1x^=c1R_1 \hat{x} = c_1, and:

r2=c22\|r\|_2 = \|c_2\|_2

The residual norm is determined by the “extra” part c2c_2.

Comparison Summary

AspectNormal EquationsQR Method
Conditionκ(A)2\kappa(A)^2κ(A)\kappa(A)
Form ATAA^T A?YesNo
Matrix-matrix productYes (O(mn2)O(mn^2))No
Memoryn×nn \times n matrixm×nm \times n factors
StabilityPoorExcellent
Costmn2+13n3mn^2 + \frac{1}{3}n^32mn223n32mn^2 - \frac{2}{3}n^3

When to Use Each Method

Use normal equations when:

Use QR when:

Rule of thumb: When in doubt, use QR.

Python Best Practices

import numpy as np

# BEST: Use lstsq (automatically chooses good method)
x, residuals, rank, s = np.linalg.lstsq(A, b, rcond=None)

# GOOD: Explicit QR
Q, R = np.linalg.qr(A)
x = np.linalg.solve(R, Q.T @ b)

# AVOID: Normal equations
# x = np.linalg.solve(A.T @ A, A.T @ b)  # Don't do this!

Summary

Further Reading