This section develops the theory of inner products, orthogonality, and projections that underlies QR factorization and least squares. Students comfortable with these concepts from Math 235 may skim this section.
Orthogonality is the key to understanding least squares. The best approximation to a vector within a subspace is characterized by the residual being orthogonal to that subspace. This principle, formulated through inner products, extends far beyond to function spaces (Hilbert spaces), where it underlies Fourier series, wavelets, and PDE theory.
Inner Product Spaces¶
The concept of “angle” and “orthogonality” generalizes beyond through the abstraction of an inner product.
Definition 1 (Inner Product)
An inner product on a vector space (over ) is a function satisfying:
Symmetry:
Linearity:
Positive definiteness: , with equality iff
A vector space with an inner product is called an inner product space.
Every inner product induces a norm: .
Example 1 (Inner Products)
| Space | Inner Product | Induced Norm |
|---|---|---|
| Euclidean norm | ||
| norm | ||
| (sequences) | norm |
The same theorems we prove for (Pythagorean theorem, best approximation, Gram-Schmidt) work in any inner product space. When the space is complete (Cauchy sequences converge), it’s called a Hilbert space.
Examples of Hilbert spaces:
with the dot product
, the natural setting for Fourier series
Sobolev spaces , the natural setting for PDEs
The finite-dimensional theory you learn here is the template for infinite-dimensional analysis.
Orthogonality¶
Definition 2 (Orthogonality)
Vectors in an inner product space are orthogonal (written ) if:
In : , so orthogonal means , i.e., at right angles.
The Pythagorean Theorem (Generalized)¶
Theorem 1 (Pythagorean Theorem)
Let be vectors in an inner product space. If , then:
Proof 1
Using only the algebraic properties of the inner product:
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(5, 4))
v = np.array([0, 2])
w = np.array([3, 0])
# v from origin
ax.annotate('', xy=v, xytext=[0, 0],
arrowprops=dict(arrowstyle='->', color='#1f77b4', lw=2))
# w from origin
ax.annotate('', xy=w, xytext=[0, 0],
arrowprops=dict(arrowstyle='->', color='#d62728', lw=2))
# v - w as the hypotenuse: draw from w to v
ax.annotate('', xy=v, xytext=w,
arrowprops=dict(arrowstyle='->', color='#2ca02c', lw=2, ls='--'))
# Right angle marker
sq = 0.25
ax.plot([sq, sq, 0], [0, sq, sq], 'k-', lw=0.8)
ax.text(v[0] - 0.3, v[1] / 2, r'$\mathbf{v}$', fontsize=14, color='#1f77b4')
ax.text(w[0] / 2, -0.3, r'$\mathbf{w}$', fontsize=14, color='#d62728')
ax.text(w[0] / 2 + 0.1, v[1] / 2 + 0.2, r'$\mathbf{v} - \mathbf{w}$',
fontsize=13, color='#2ca02c')
ax.set_xlim(-0.5, 4)
ax.set_ylim(-0.5, 2.5)
ax.set_aspect('equal')
ax.grid(True, alpha=0.3)
ax.set_title(r'Pythagorean theorem: $\|\mathbf{v}\|^2 + \|\mathbf{w}\|^2 = \|\mathbf{v} - \mathbf{w}\|^2$')
plt.tight_layout()
plt.show()
Subspaces¶
Definition 3 (Subspace)
A subspace of an inner product space is a subset that is closed under addition and scalar multiplication:
If , then
If and , then
Example 2 (Subspaces in )
A subspace is any subset closed under addition and scalar multiplication. In , subspaces are lines, planes, and hyperplanes through the origin.
A line through the origin is a 1D subspace: . Every scalar multiple of lies on this line (left panel).
The range (column space) of a matrix is the subspace . It is spanned by the columns of . If is with rank 2, the three columns span all of (right panel).
fig, axes = plt.subplots(1, 2, figsize=(11, 4.5))
# Left: a 1D subspace (line) in R^2
ax = axes[0]
v = np.array([2, 1])
t = np.linspace(-2, 2, 100)
ax.plot(t * v[0], t * v[1], 'b-', lw=2, alpha=0.4)
ax.annotate('', xy=v, xytext=[0, 0],
arrowprops=dict(arrowstyle='->', color='#1f77b4', lw=2))
ax.text(v[0] + 0.1, v[1] + 0.15, r'$\mathbf{v}$', fontsize=13, color='#1f77b4')
# Show scalar multiples
for s, label in [(-1, r'$-\mathbf{v}$'), (0.5, r'$\frac{1}{2}\mathbf{v}$'),
(1.5, r'$\frac{3}{2}\mathbf{v}$')]:
pt = s * v
ax.plot(pt[0], pt[1], 'bo', markersize=5)
ax.text(pt[0] + 0.15, pt[1] + 0.15, label, fontsize=10, color='#1f77b4')
ax.plot(0, 0, 'ko', markersize=4)
ax.set_xlim(-4.5, 4.5)
ax.set_ylim(-2.5, 2.5)
ax.set_aspect('equal')
ax.axhline(0, color='k', lw=0.5)
ax.axvline(0, color='k', lw=0.5)
ax.grid(True, alpha=0.3)
ax.set_title(r'$U = \mathrm{span}\{\mathbf{v}\}$: a line through the origin')
# Right: range of a 2x3 matrix
ax = axes[1]
A2 = np.array([[2, 0.5, -1], [0.3, 1.5, 1]])
cols = A2.T
colors = ['#1f77b4', '#d62728', '#2ca02c']
for i, (c, col) in enumerate(zip(cols, colors)):
ax.annotate('', xy=c, xytext=[0, 0],
arrowprops=dict(arrowstyle='->', lw=2, color=col))
ax.text(c[0] + 0.1, c[1] + 0.1, f'$\\mathbf{{a}}_{i+1}$',
fontsize=13, color=col)
# Show that a3 = combination of a1, a2 (if applicable)
ax.fill([-3, 3, 3, -3], [-3, -3, 3, 3], alpha=0.05, color='blue')
ax.text(2.0, -1.5, r'$\mathrm{R}(A) = \mathbb{R}^2$', fontsize=12,
color='blue', alpha=0.6)
ax.plot(0, 0, 'ko', markersize=4)
ax.set_xlim(-2.5, 2.5)
ax.set_ylim(-2, 2)
ax.set_aspect('equal')
ax.axhline(0, color='k', lw=0.5)
ax.axvline(0, color='k', lw=0.5)
ax.grid(True, alpha=0.3)
ax.set_title(r'$\mathrm{R}(A) = \mathrm{span}\{\mathbf{a}_1, \mathbf{a}_2, \mathbf{a}_3\}$')
plt.tight_layout()
plt.show()
Example 3 (Subspaces in )
Function spaces have subspaces too:
Polynomials of degree :
Trigonometric polynomials:
These are finite-dimensional subspaces of the infinite-dimensional space . Approximating a function by an element of these subspaces is exactly polynomial interpolation (or Fourier approximation).
Orthogonal Complements¶
Definition 4 (Orthogonal Complement)
The orthogonal complement of a subspace is:
This is the set of all vectors orthogonal to everything in .
Example 4 (Orthogonal Complement)
In : if (the -plane), then (the -axis).
Orthogonal Projection¶
The orthogonal projection of onto a unit vector is:
This gives the component of in the direction of .
For projection onto a subspace with orthonormal basis :
The Best Approximation Theorem¶
Theorem 2 (Best Approximation)
Let be a subspace of an inner product space and . Then is the best approximation to in (minimizing over all ) if and only if:
That is, the error vector is orthogonal to the subspace.
Proof 2
fig, ax = plt.subplots(figsize=(6, 4.5))
# Subspace U (a line through origin)
u_dir = np.array([1, 0.3])
u_dir = u_dir / np.linalg.norm(u_dir)
t = np.linspace(-0.5, 4, 100)
ax.plot(t * u_dir[0], t * u_dir[1], 'k-', lw=1.5, alpha=0.4)
ax.text(3.5 * u_dir[0] + 0.1, 3.5 * u_dir[1] - 0.3, '$U$', fontsize=14)
# Point x
x = np.array([2.0, 2.5])
# Projection z = proj_U(x)
z = np.dot(x, u_dir) * u_dir
# Draw vectors
ax.annotate('', xy=x, xytext=[0, 0],
arrowprops=dict(arrowstyle='->', color='#1f77b4', lw=2))
ax.annotate('', xy=z, xytext=[0, 0],
arrowprops=dict(arrowstyle='->', color='#d62728', lw=2))
# Residual x - z
ax.annotate('', xy=x, xytext=z,
arrowprops=dict(arrowstyle='->', color='#2ca02c', lw=2, ls='--'))
# Right angle marker at z
perp = x - z
perp_n = perp / np.linalg.norm(perp)
u_n = u_dir
sq = 0.15
corner = z + sq * u_n
corner2 = z + sq * u_n + sq * perp_n
corner3 = z + sq * perp_n
ax.plot([corner[0], corner2[0], corner3[0]],
[corner[1], corner2[1], corner3[1]], 'k-', lw=0.8)
ax.text(x[0] + 0.1, x[1] + 0.1, r'$\mathbf{x}$', fontsize=14, color='#1f77b4')
ax.text(z[0] + 0.1, z[1] - 0.3, r'$\mathbf{z} = \mathrm{proj}_U \mathbf{x}$',
fontsize=13, color='#d62728')
mid = (x + z) / 2
ax.text(mid[0] + 0.15, mid[1], r'$\mathbf{x} - \mathbf{z} \perp U$',
fontsize=13, color='#2ca02c')
ax.plot(0, 0, 'ko', markersize=4)
ax.set_xlim(-0.5, 4)
ax.set_ylim(-0.5, 3)
ax.set_aspect('equal')
ax.grid(True, alpha=0.3)
ax.set_title('Best approximation: the residual is orthogonal to $U$')
plt.tight_layout()
plt.show()
Orthonormal Bases¶
Definition 5 (Orthonormal Basis)
A basis for a subspace is orthonormal if:
for all (orthogonal)
for all (normalized)
Equivalently: where is the Kronecker delta.
Why orthonormal bases are useful:
Coefficients are easy: implies
Projections are simple:
Condition number is 1: No amplification of errors
Remark 1 (The Bigger Picture: Fourier Series)
In , the functions form an orthogonal basis. The Fourier coefficients are just projections onto basis elements. The same formula works in and in function spaces.
Orthogonal Matrices¶
Definition 6 (Orthogonal Matrix)
A square matrix is orthogonal if its columns form an orthonormal basis for .
Equivalently: (and thus ).
Key properties of orthogonal matrices:
| Property | Meaning |
|---|---|
| Columns are orthonormal | |
| Rows are orthonormal | |
| Inverse is just transpose | |
| Preserves lengths (isometry) | |
| Perfect conditioning |