Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Background

This section develops the theory of inner products, orthogonality, and projections that underlies QR factorization and least squares. Students comfortable with these concepts from Math 235 may skim this section.

Big Idea

Orthogonality is the key to understanding least squares. The best approximation to a vector within a subspace is characterized by the residual being orthogonal to that subspace. This principle, formulated through inner products, extends far beyond Rn\mathbb{R}^n to function spaces (Hilbert spaces), where it underlies Fourier series, wavelets, and PDE theory.

Inner Product Spaces

The concept of “angle” and “orthogonality” generalizes beyond Rn\mathbb{R}^n through the abstraction of an inner product.

Definition 1 (Inner Product)

An inner product on a vector space VV (over R\mathbb{R}) is a function ,:V×VR\langle \cdot, \cdot \rangle: V \times V \to \mathbb{R} satisfying:

  1. Symmetry: x,y=y,x\langle x, y \rangle = \langle y, x \rangle

  2. Linearity: αx+βy,z=αx,z+βy,z\langle \alpha x + \beta y, z \rangle = \alpha\langle x, z \rangle + \beta\langle y, z \rangle

  3. Positive definiteness: x,x0\langle x, x \rangle \geq 0, with equality iff x=0x = 0

A vector space with an inner product is called an inner product space.

Every inner product induces a norm: x=x,x\|x\| = \sqrt{\langle x, x \rangle}.

Example 1 (Inner Products)

SpaceInner ProductInduced Norm
Rn\mathbb{R}^nx,y=xTy=ixiyi\langle x, y \rangle = x^T y = \sum_i x_i y_iEuclidean norm x2|x|_2
L2[a,b]L^2[a,b]f,g=abf(x)g(x)dx\langle f, g \rangle = \int_a^b f(x)g(x)\,dxL2L^2 norm f2|f|_2
2\ell^2 (sequences)x,y=i=1xiyi\langle x, y \rangle = \sum_{i=1}^\infty x_i y_i2\ell^2 norm

The same theorems we prove for Rn\mathbb{R}^n (Pythagorean theorem, best approximation, Gram-Schmidt) work in any inner product space. When the space is complete (Cauchy sequences converge), it’s called a Hilbert space.

Examples of Hilbert spaces:

  • Rn\mathbb{R}^n with the dot product

  • L2[a,b]L^2[a,b], the natural setting for Fourier series

  • Sobolev spaces HkH^k, the natural setting for PDEs

The finite-dimensional theory you learn here is the template for infinite-dimensional analysis.

Orthogonality

Definition 2 (Orthogonality)

Vectors x,yx, y in an inner product space are orthogonal (written xyx \perp y) if:

x,y=0\langle x, y \rangle = 0

In Rn\mathbb{R}^n: x,y=x2y2cos(θ)\langle x, y \rangle = \|x\|_2 \|y\|_2 \cos(\theta), so orthogonal means θ=±π/2\theta = \pm \pi/2, i.e., at right angles.

The Pythagorean Theorem (Generalized)

Theorem 1 (Pythagorean Theorem)

Let v,wv, w be vectors in an inner product space. If vwv \perp w, then:

v2+w2=vw2\|v\|^2 + \|w\|^2 = \|v - w\|^2

Proof 1

Using only the algebraic properties of the inner product:

vw2=vw,vw=v,vv,ww,v+w,w=v2+w2(since v,w=0)\begin{align} \|v - w\|^2 &= \langle v-w, v-w \rangle \\ &= \langle v, v \rangle - \langle v, w \rangle - \langle w, v \rangle + \langle w, w \rangle \\ &= \|v\|^2 + \|w\|^2 \quad \text{(since } \langle v, w \rangle = 0\text{)} \end{align}
Source
<Figure size 500x400 with 1 Axes>

Subspaces

Definition 3 (Subspace)

A subspace UU of an inner product space VV is a subset that is closed under addition and scalar multiplication:

  • If u,vUu, v \in U, then u+vUu + v \in U

  • If uUu \in U and αR\alpha \in \mathbb{R}, then αuU\alpha u \in U

Example 2 (Subspaces in Rn\mathbb{R}^n)

A subspace is any subset closed under addition and scalar multiplication. In Rn\mathbb{R}^n, subspaces are lines, planes, and hyperplanes through the origin.

A line through the origin is a 1D subspace: U=span{v}={αv:αR}U = \text{span}\{\mathbf{v}\} = \{\alpha \mathbf{v} : \alpha \in \mathbb{R}\}. Every scalar multiple of v\mathbf{v} lies on this line (left panel).

The range (column space) of a matrix AA is the subspace R(A)={Ax:xRn}\text{R}(A) = \{A\mathbf{x} : \mathbf{x} \in \mathbb{R}^n\}. It is spanned by the columns of AA. If AA is 2×32 \times 3 with rank 2, the three columns span all of R2\mathbb{R}^2 (right panel).

Source
<Figure size 1100x450 with 2 Axes>

Example 3 (Subspaces in L2[a,b]L^2[a,b])

Function spaces have subspaces too:

  • Polynomials of degree n\leq n: Pn=span{1,x,x2,,xn}\mathcal{P}_n = \text{span}\{1, x, x^2, \ldots, x^n\}

  • Trigonometric polynomials: span{1,cosx,sinx,cos2x,sin2x,}\text{span}\{1, \cos x, \sin x, \cos 2x, \sin 2x, \ldots\}

These are finite-dimensional subspaces of the infinite-dimensional space L2[a,b]L^2[a,b]. Approximating a function ff by an element of these subspaces is exactly polynomial interpolation (or Fourier approximation).

Orthogonal Complements

Definition 4 (Orthogonal Complement)

The orthogonal complement of a subspace UU is:

U={vV:u,v=0 for all uU}U^\perp = \{v \in V : \langle u, v \rangle = 0 \text{ for all } u \in U\}

This is the set of all vectors orthogonal to everything in UU.

Example 4 (Orthogonal Complement)

In R3\mathbb{R}^3: if U={(x,y,0)}U = \{(x, y, 0)\} (the xyxy-plane), then U={(0,0,z)}U^\perp = \{(0, 0, z)\} (the zz-axis).

Orthogonal Projection

The orthogonal projection of vv onto a unit vector uu is:

projuv=v,uu\text{proj}_u v = \langle v, u \rangle \, u

This gives the component of vv in the direction of uu.

For projection onto a subspace UU with orthonormal basis {u1,,um}\{u_1, \ldots, u_m\}:

projUv=i=1mv,uiui\text{proj}_U v = \sum_{i=1}^m \langle v, u_i \rangle \, u_i

The Best Approximation Theorem

Theorem 2 (Best Approximation)

Let UU be a subspace of an inner product space VV and xVx \in V. Then zUz \in U is the best approximation to xx in UU (minimizing xu\|x - u\| over all uUu \in U) if and only if:

xzUx - z \in U^\perp

That is, the error vector is orthogonal to the subspace.

Proof 2

Suppose zUz \in U and xzUx - z \in U^\perp. For any uUu \in U:

Since zuUz - u \in U (subspaces are closed under subtraction) and xzUx - z \perp U:

xz,zu=0\langle x - z, z - u \rangle = 0

By Pythagoras:

xz2+zu2=xu2\|x - z\|^2 + \|z - u\|^2 = \|x - u\|^2

Since zu20\|z - u\|^2 \geq 0, we have xzxu\|x - z\| \leq \|x - u\| for all uUu \in U.

Source
<Figure size 600x450 with 1 Axes>

Orthonormal Bases

Definition 5 (Orthonormal Basis)

A basis {u1,,um}\{u_1, \ldots, u_m\} for a subspace UU is orthonormal if:

  1. ui,uj=0\langle u_i, u_j \rangle = 0 for all iji \neq j (orthogonal)

  2. ui=1\|u_i\| = 1 for all ii (normalized)

Equivalently: ui,uj=δij\langle u_i, u_j \rangle = \delta_{ij} where δij\delta_{ij} is the Kronecker delta.

Why orthonormal bases are useful:

  1. Coefficients are easy: v=iciuiv = \sum_i c_i u_i implies ci=v,uic_i = \langle v, u_i \rangle

  2. Projections are simple: projUv=iv,uiui\text{proj}_U v = \sum_i \langle v, u_i \rangle \, u_i

  3. Condition number is 1: No amplification of errors

Remark 1 (The Bigger Picture: Fourier Series)

In L2[π,π]L^2[-\pi, \pi], the functions {1,cosx,sinx,cos2x,sin2x,}\{1, \cos x, \sin x, \cos 2x, \sin 2x, \ldots\} form an orthogonal basis. The Fourier coefficients an=f,cos(nx)a_n = \langle f, \cos(nx) \rangle are just projections onto basis elements. The same formula works in Rn\mathbb{R}^n and in function spaces.

Orthogonal Matrices

Definition 6 (Orthogonal Matrix)

A square matrix QRn×nQ \in \mathbb{R}^{n \times n} is orthogonal if its columns form an orthonormal basis for Rn\mathbb{R}^n.

Equivalently: QTQ=IQ^T Q = I (and thus Q1=QTQ^{-1} = Q^T).

Key properties of orthogonal matrices:

PropertyMeaning
QTQ=IQ^T Q = IColumns are orthonormal
QQT=IQ Q^T = IRows are orthonormal
Q1=QTQ^{-1} = Q^TInverse is just transpose
Qx2=x2|Qx|_2 = |x|_2Preserves lengths (isometry)
κ2(Q)=1\kappa_2(Q) = 1Perfect conditioning