Orthogonality and Projections University of Massachusetts Amherst
Orthogonality is the key to understanding least squares. The best approximation to a vector within a subspace is characterized by the residual being orthogonal to that subspace. This principle—formulated through inner products —extends far beyond R n \mathbb{R}^n R n to function spaces (Hilbert spaces), where it underlies Fourier series, wavelets, and PDE theory.
Inner Product Spaces ¶ The concept of “angle” and “orthogonality” generalizes beyond R n \mathbb{R}^n R n through the abstraction of an inner product .
An inner product on a vector space V V V (over R \mathbb{R} R ) is a function ⟨ ⋅ , ⋅ ⟩ : V × V → R \langle \cdot, \cdot \rangle: V \times V \to \mathbb{R} ⟨ ⋅ , ⋅ ⟩ : V × V → R satisfying:
Symmetry: ⟨ x , y ⟩ = ⟨ y , x ⟩ \langle x, y \rangle = \langle y, x \rangle ⟨ x , y ⟩ = ⟨ y , x ⟩
Linearity: ⟨ α x + β y , z ⟩ = α ⟨ x , z ⟩ + β ⟨ y , z ⟩ \langle \alpha x + \beta y, z \rangle = \alpha\langle x, z \rangle + \beta\langle y, z \rangle ⟨ αx + β y , z ⟩ = α ⟨ x , z ⟩ + β ⟨ y , z ⟩
Positive definiteness: ⟨ x , x ⟩ ≥ 0 \langle x, x \rangle \geq 0 ⟨ x , x ⟩ ≥ 0 , with equality iff x = 0 x = 0 x = 0
A vector space with an inner product is called an inner product space .
Every inner product induces a norm: ∥ x ∥ = ⟨ x , x ⟩ \|x\| = \sqrt{\langle x, x \rangle} ∥ x ∥ = ⟨ x , x ⟩ .
Example 1 (Inner Products)
Space Inner Product Induced Norm R n \mathbb{R}^n R n ⟨ x , y ⟩ = x T y = ∑ i x i y i \langle x, y \rangle = x^T y = \sum_i x_i y_i ⟨ x , y ⟩ = x T y = ∑ i x i y i Euclidean norm ∣ x ∣ 2 |x|_2 ∣ x ∣ 2 L 2 [ a , b ] L^2[a,b] L 2 [ a , b ] ⟨ f , g ⟩ = ∫ a b f ( x ) g ( x ) d x \langle f, g \rangle = \int_a^b f(x)g(x)\,dx ⟨ f , g ⟩ = ∫ a b f ( x ) g ( x ) d x L 2 L^2 L 2 norm ∣ f ∣ 2 |f|_2 ∣ f ∣ 2 ℓ 2 \ell^2 ℓ 2 (sequences)⟨ x , y ⟩ = ∑ i = 1 ∞ x i y i \langle x, y \rangle = \sum_{i=1}^\infty x_i y_i ⟨ x , y ⟩ = ∑ i = 1 ∞ x i y i ℓ 2 \ell^2 ℓ 2 norm
The same theorems we prove for R n \mathbb{R}^n R n —Pythagorean theorem, best approximation, Gram-Schmidt—work in any inner product space. When the space is complete (Cauchy sequences converge), it’s called a Hilbert space .
Examples of Hilbert spaces:
R n \mathbb{R}^n R n with the dot product
L 2 [ a , b ] L^2[a,b] L 2 [ a , b ] — the natural setting for Fourier series
Sobolev spaces H k H^k H k — the natural setting for PDE s
The finite-dimensional theory you learn here is the template for infinite-dimensional analysis.
Orthogonality ¶ In R n \mathbb{R}^n R n : ⟨ x , y ⟩ = ∥ x ∥ 2 ∥ y ∥ 2 cos ( θ ) \langle x, y \rangle = \|x\|_2 \|y\|_2 \cos(\theta) ⟨ x , y ⟩ = ∥ x ∥ 2 ∥ y ∥ 2 cos ( θ ) , so orthogonal means θ = ± π / 2 \theta = \pm \pi/2 θ = ± π /2 —at right angles .
The Pythagorean Theorem (Generalized) ¶ Let v , w v, w v , w be vectors in an inner product space. If v ⊥ w v \perp w v ⊥ w , then:
∥ v ∥ 2 + ∥ w ∥ 2 = ∥ v − w ∥ 2 \|v\|^2 + \|w\|^2 = \|v - w\|^2 ∥ v ∥ 2 + ∥ w ∥ 2 = ∥ v − w ∥ 2 Proof 1
Using only the algebraic properties of the inner product:
∥ v − w ∥ 2 = ⟨ v − w , v − w ⟩ = ⟨ v , v ⟩ − ⟨ v , w ⟩ − ⟨ w , v ⟩ + ⟨ w , w ⟩ = ∥ v ∥ 2 + ∥ w ∥ 2 (since ⟨ v , w ⟩ = 0 ) \begin{align}
\|v - w\|^2 &= \langle v-w, v-w \rangle \\
&= \langle v, v \rangle - \langle v, w \rangle - \langle w, v \rangle + \langle w, w \rangle \\
&= \|v\|^2 + \|w\|^2 \quad \text{(since } \langle v, w \rangle = 0\text{)}
\end{align} ∥ v − w ∥ 2 = ⟨ v − w , v − w ⟩ = ⟨ v , v ⟩ − ⟨ v , w ⟩ − ⟨ w , v ⟩ + ⟨ w , w ⟩ = ∥ v ∥ 2 + ∥ w ∥ 2 (since ⟨ v , w ⟩ = 0 ) This 2D picture captures the high-dimensional truth:
Subspaces ¶ Examples in R n \mathbb{R}^n R n :
{ ( x , y , 0 ) : x , y ∈ R } \{(x, y, 0) : x, y \in \mathbb{R}\} {( x , y , 0 ) : x , y ∈ R } is a subspace of R 3 \mathbb{R}^3 R 3 (the x y xy x y -plane)
span { v 1 , … , v k } = { α 1 v 1 + ⋯ + α k v k } \text{span}\{v_1, \ldots, v_k\} = \{\alpha_1 v_1 + \cdots + \alpha_k v_k\} span { v 1 , … , v k } = { α 1 v 1 + ⋯ + α k v k }
The range of a matrix: R ( A ) = { A x : x ∈ R n } \text{R}(A) = \{Ax : x \in \mathbb{R}^n\} R ( A ) = { A x : x ∈ R n }
Examples in L 2 [ a , b ] L^2[a,b] L 2 [ a , b ] :
Polynomials of degree ≤ n \leq n ≤ n
Trigonometric polynomials span { 1 , cos x , sin x , cos 2 x , sin 2 x , … } \text{span}\{1, \cos x, \sin x, \cos 2x, \sin 2x, \ldots\} span { 1 , cos x , sin x , cos 2 x , sin 2 x , … }
Orthogonal Complements ¶ This is the set of all vectors orthogonal to everything in U U U .
Example in R 3 \mathbb{R}^3 R 3 : If U = { ( x , y , 0 ) } U = \{(x, y, 0)\} U = {( x , y , 0 )} (the x y xy x y -plane), then U ⊥ = { ( 0 , 0 , z ) } U^\perp = \{(0, 0, z)\} U ⊥ = {( 0 , 0 , z )} (the z z z -axis).
Orthogonal Projection ¶ The orthogonal projection of v v v onto a unit vector u u u is:
proj u v = ⟨ v , u ⟩ u \text{proj}_u v = \langle v, u \rangle \, u proj u v = ⟨ v , u ⟩ u This gives the component of v v v in the direction of u u u .
For projection onto a subspace U U U with orthonormal basis { u 1 , … , u m } \{u_1, \ldots, u_m\} { u 1 , … , u m } :
proj U v = ∑ i = 1 m ⟨ v , u i ⟩ u i \text{proj}_U v = \sum_{i=1}^m \langle v, u_i \rangle \, u_i proj U v = i = 1 ∑ m ⟨ v , u i ⟩ u i The Best Approximation Theorem ¶ Let U U U be a subspace of an inner product space V V V and x ∈ V x \in V x ∈ V . Then z ∈ U z \in U z ∈ U is the best approximation to x x x in U U U (minimizing ∥ x − u ∥ \|x - u\| ∥ x − u ∥ over all u ∈ U u \in U u ∈ U ) if and only if:
x − z ∈ U ⊥ x - z \in U^\perp x − z ∈ U ⊥ That is, the error vector is orthogonal to the subspace.
Proof 2
Suppose z ∈ U z \in U z ∈ U and x − z ∈ U ⊥ x - z \in U^\perp x − z ∈ U ⊥ . For any u ∈ U u \in U u ∈ U :
Since z − u ∈ U z - u \in U z − u ∈ U (subspaces are closed under subtraction) and x − z ⊥ U x - z \perp U x − z ⊥ U :
⟨ x − z , z − u ⟩ = 0 \langle x - z, z - u \rangle = 0 ⟨ x − z , z − u ⟩ = 0 By Pythagoras:
∥ x − z ∥ 2 + ∥ z − u ∥ 2 = ∥ x − u ∥ 2 \|x - z\|^2 + \|z - u\|^2 = \|x - u\|^2 ∥ x − z ∥ 2 + ∥ z − u ∥ 2 = ∥ x − u ∥ 2 Since ∥ z − u ∥ 2 ≥ 0 \|z - u\|^2 \geq 0 ∥ z − u ∥ 2 ≥ 0 , we have ∥ x − z ∥ ≤ ∥ x − u ∥ \|x - z\| \leq \|x - u\| ∥ x − z ∥ ≤ ∥ x − u ∥ for all u ∈ U u \in U u ∈ U . ∎
Geometric Picture:
x
/|
/ | (x - z) ⟂ U
/ |
z---+---- U (subspace)Orthonormal Bases ¶ A basis { u 1 , … , u m } \{u_1, \ldots, u_m\} { u 1 , … , u m } for a subspace U U U is orthonormal if:
⟨ u i , u j ⟩ = 0 \langle u_i, u_j \rangle = 0 ⟨ u i , u j ⟩ = 0 for all i ≠ j i \neq j i = j (orthogonal)
∥ u i ∥ = 1 \|u_i\| = 1 ∥ u i ∥ = 1 for all i i i (normalized)
Equivalently: ⟨ u i , u j ⟩ = δ i j \langle u_i, u_j \rangle = \delta_{ij} ⟨ u i , u j ⟩ = δ ij where δ i j \delta_{ij} δ ij is the Kronecker delta.
Why orthonormal bases are useful:
Coefficients are easy: v = ∑ i c i u i v = \sum_i c_i u_i v = ∑ i c i u i implies c i = ⟨ v , u i ⟩ c_i = \langle v, u_i \rangle c i = ⟨ v , u i ⟩
Projections are simple: proj U v = ∑ i ⟨ v , u i ⟩ u i \text{proj}_U v = \sum_i \langle v, u_i \rangle \, u_i proj U v = ∑ i ⟨ v , u i ⟩ u i
Condition number is 1: No amplification of errors
In L 2 [ − π , π ] L^2[-\pi, \pi] L 2 [ − π , π ] , the functions { 1 , cos x , sin x , cos 2 x , sin 2 x , … } \{1, \cos x, \sin x, \cos 2x, \sin 2x, \ldots\} { 1 , cos x , sin x , cos 2 x , sin 2 x , … } form an orthogonal basis. The Fourier coefficients a n = ⟨ f , cos ( n x ) ⟩ a_n = \langle f, \cos(nx) \rangle a n = ⟨ f , cos ( n x )⟩ are just projections onto basis elements—the same formula works in R n \mathbb{R}^n R n and in function spaces!
Orthogonal Matrices ¶ A square matrix Q ∈ R n × n Q \in \mathbb{R}^{n \times n} Q ∈ R n × n is orthogonal if its columns form an orthonormal basis for R n \mathbb{R}^n R n .
Equivalently: Q T Q = I Q^T Q = I Q T Q = I (and thus Q − 1 = Q T Q^{-1} = Q^T Q − 1 = Q T ).
Key properties of orthogonal matrices:
Property Meaning Q T Q = I Q^T Q = I Q T Q = I Columns are orthonormal Q Q T = I Q Q^T = I Q Q T = I Rows are orthonormal Q − 1 = Q T Q^{-1} = Q^T Q − 1 = Q T Inverse is just transpose ∣ Q x ∣ 2 = ∣ x ∣ 2 |Qx|_2 = |x|_2 ∣ Q x ∣ 2 = ∣ x ∣ 2 Preserves lengths (isometry) κ 2 ( Q ) = 1 \kappa_2(Q) = 1 κ 2 ( Q ) = 1 Perfect conditioning