The Problem¶
Given with and , the system typically has no solution.
Why? The vector usually doesn’t lie in the column space of .
Goal: Find that minimizes the residual:
Linear Regression Example¶
The most common application: fitting a model to data.
Example: Given data points , fit a polynomial :
With data points, this system is overdetermined.
Geometric Interpretation¶
The least squares solution finds the point in closest to :
b
/|
/ | residual r = b - Ax̂
/ | is perpendicular to R(A)
Ax̂--+---- R(A) (column space)Key insight: The residual is orthogonal to the column space of .
The Normal Equations¶
From the orthogonality condition , we can derive:
Derivation: We need for all .
This requires , giving .
Why Are They Called “Normal” Equations?¶
The name comes from the fact that the residual is normal (perpendicular) to the column space—not because they’re “standard” equations.
Properties of ¶
When has full column rank:
| Property | Statement |
|---|---|
| Symmetric | |
| Positive definite | for |
| Invertible | Follows from positive definiteness |
| Condition number | ⚠️ |
Minimization Viewpoint¶
The least squares problem is equivalent to minimizing:
Taking the gradient and setting to zero:
yields the normal equations.
Observation: This is a quadratic function with positive definite Hessian , so there’s a unique global minimum.
Multiple Linear Regression¶
In statistics notation, the least squares problem for regression:
where:
is the design matrix (observations of explanatory variables)
is the response vector
are the regression coefficients
The columns of typically include a column of ones (for the intercept).
The Pseudoinverse¶
The matrix is called the Moore-Penrose pseudoinverse :
So .
Properties:
(left inverse)
in general (not a right inverse)
projects onto
Numerical Considerations¶
Example: Polynomial Fitting¶
import numpy as np
# Data points
t = np.array([0, 1, 2, 3, 4])
y = np.array([1.0, 2.1, 3.9, 8.2, 15.8])
# Design matrix for quadratic fit
X = np.column_stack([np.ones_like(t), t, t**2])
# Normal equations (don't do this!)
# beta_bad = np.linalg.solve(X.T @ X, X.T @ y)
# Better: use lstsq which uses QR internally
beta, residuals, rank, s = np.linalg.lstsq(X, y, rcond=None)Residual Analysis¶
After finding , the residual is:
Properties:
is the sum of squared errors (SSE)
The residual measures how well the model fits the data.
Summary¶
| Concept | Formula |
|---|---|
| Least squares problem | |
| Normal equations | |
| Solution (if full rank) | |
| Residual | |
| Condition number issue |