The Problem with Normal Equations¶
Recall the normal equations:
The issue:
If , then —near machine precision!
Reduced vs Full QR¶
For least squares, we need to distinguish two forms of the QR factorization:
Remark 1 (Full vs Reduced QR)
The QR Solution¶
Given the reduced QR factorization :
The least squares solution satisfies .
Therefore:
Multiply both sides by :
Why This Works¶
Step 1: Project onto the column space of :
Step 2: Since , there exists with:
Step 3: Multiply by (using ):
Algorithm¶
import numpy as np
from scipy.linalg import solve_triangular
def least_squares_qr(A, b):
"""Solve least squares via QR factorization."""
Q, R = np.linalg.qr(A, mode='reduced')
c = Q.T @ b
x = solve_triangular(R, c, lower=False)
return xSteps:
Compute
Compute
Solve by back substitution
Condition Number Comparison¶
| Method | Effective Condition Number |
|---|---|
| Normal equations | |
| QR factorization |
The QR method never squares the condition number!
Dramatic Example¶
This example from Trefethen and Bau shows the difference:
import numpy as np
def fake_regression():
"""Ill-conditioned polynomial regression problem."""
N = 100
k = 14
t = np.linspace(0, 1, N)
X = np.column_stack([t**i for i in range(k+1)])
y = np.exp(np.sin(4*t))
y = y / 2006.787453080206 # normalize
return X, y
X, y = fake_regression()
# Method 1: Normal equations (BAD!)
XtX = X.T @ X
Xty = X.T @ y
beta_normal = np.linalg.solve(XtX, Xty)
# Method 2: QR factorization (GOOD!)
Q, R = np.linalg.qr(X, mode='reduced')
beta_qr = np.linalg.solve(R, Q.T @ y)
print(f"Normal equations: beta[14] = {beta_normal[14]}")
print(f"QR factorization: beta[14] = {beta_qr[14]}")
print(f"True value: beta[14] ≈ 1.0")Typical output:
Normal equations: beta[14] = 0.3847... (WRONG!)
QR factorization: beta[14] = 0.9999993... (correct to 7 figures)Why Such a Big Difference?¶
For this problem:
With machine epsilon :
| Method | Expected relative error |
|---|---|
| Normal equations | (useless!) |
| QR factorization | (7 digits) |
The Residual¶
After solving, the residual is:
The residual norm squared (sum of squared errors):
Full QR Viewpoint¶
Using the full QR, where :
where and .
The least squares solution satisfies , and:
The residual norm is determined by the “extra” part .
Comparison Summary¶
| Aspect | Normal Equations | QR Method |
|---|---|---|
| Condition | ||
| Form ? | Yes | No |
| Matrix-matrix product | Yes () | No |
| Memory | matrix | factors |
| Stability | Poor | Excellent |
| Cost |
When to Use Each Method¶
Use normal equations when:
is well-conditioned ( or so)
is already available
Speed is critical and accuracy less so
Use QR when:
may be ill-conditioned
High accuracy is needed
is sparse (specialized QR methods exist)
You’re not sure about the conditioning
Rule of thumb: When in doubt, use QR.
Python Best Practices¶
import numpy as np
# BEST: Use lstsq (automatically chooses good method)
x, residuals, rank, s = np.linalg.lstsq(A, b, rcond=None)
# GOOD: Explicit QR
Q, R = np.linalg.qr(A)
x = np.linalg.solve(R, Q.T @ b)
# AVOID: Normal equations
# x = np.linalg.solve(A.T @ A, A.T @ b) # Don't do this!Summary¶
Further Reading¶
Trefethen & Bau, Numerical Linear Algebra, Lecture 11
Golub & Van Loan, Matrix Computations, Chapter 5