Least Squares Approximation

Least Squares Approximation#

Find the least squares approximation of the system \(A \boldsymbol{x} \approx \boldsymbol{b}\) by minimizing the distance \(\| A \boldsymbol{x} - \boldsymbol{b}\|\). There are several methods to find the approximation including the normal equations and the QR equations.

Definition#

Let \(A\) be an \(m \times n\) matrix with \(m > n\) and \(\mathrm{rank}(A) = n\). The least squares approximation of the system \(A \boldsymbol{x} \approx \boldsymbol{b}\) is the vector \(\boldsymbol{x}\) which minimizes the distance \(\| A\boldsymbol{x} - \boldsymbol{b} \|\).

Normal Equations#

Let \(U \subseteq \mathbb{R}^n\) be a subspace and let \(\{ \boldsymbol{u}_1, \dots, \boldsymbol{u}_m \}\) be a basis (not necessarily orthogonal). Assemble the matrix \(A\) by setting the columns equal to the basis vectors of \(U\). Then the projection of the vector \(\boldsymbol{x}\) onto \(U\) is equivalent to determine the linaer combination \(A\boldsymbol{y}\) closest to \(\boldsymbol{x}\). From the geometry we expect that the error \(\boldsymbol{e} = \boldsymbol{x} - A\boldsymbol{y}\) be perpendicular to the subspace \(U\). Equivalently perpendicular to all the basis vectors of the subspace \(U\).

\[\begin{split} \begin{array}{cc} \boldsymbol{u}_1^T \left( \boldsymbol{x} - A\boldsymbol{y}\right) &= 0 \\ \boldsymbol{u}_2^T \left( \boldsymbol{x} - A\boldsymbol{y}\right) &= 0 \\ \vdots &= \vdots \\ \boldsymbol{u}_m^T \left( \boldsymbol{x} - A\boldsymbol{y}\right) &= 0 \\ \end{array} \end{split}\]

Assembling this into matrix form this becomes

\[ A^T \left( \boldsymbol{x} - A\boldsymbol{y} \right) = 0 \quad\iff\quad A^T A \boldsymbol{y} = A^T \boldsymbol{x}. \]

Note that \(U = R[A]\) and so \(\boldsymbol{e} \in R[A]^\perp = N[A^T]\).

Is \(A^T A\) invertible? Yes if \(\mathrm{rank}(A) = n = \mathrm{rank}(A^T)\).

This means that we can determine \(\boldsymbol{y}\) uniquely by solving the above linear system. In particular, this means that

\[ \boldsymbol{y} = \left(A^T A \right)^{-1} A^T \boldsymbol{x} \quad\mbox{and}\quad p = A\boldsymbol{y} = A\left(A^T A \right)^{-1} A^T \boldsymbol{x} \]

Thus the projection matrix onto the subspace \(U\) is given by \(P := A\left(A^T A \right)^{-1} A^T\).

Let \(A\) be an \(m \times n\) matrix with \(m > n\) and \(\mathrm{rank}(A) = n\). The least squares approximation of the system \(A \boldsymbol{x} \approx \boldsymbol{b}\) is the solution of the system

\[ A^TA\boldsymbol{x} = A^T\boldsymbol{b} \]

The system is called the normal equations.

QR Equations#

Let \(A\) be an \(m \times n\) matrix with \(m > n\) and \(\mathrm{rank}(A) = n\). The least squares approximation of the system \(A \boldsymbol{x} \approx \boldsymbol{b}\) is the solution of the system of equations

\[ R_1\boldsymbol{x} = Q_1^T \boldsymbol{b} \]

where \(A = Q_1 R_1\) is the thin QR decomopsition. The system is called the QR equations. Futhermore, the residual is given by

\[ \| A \boldsymbol{x} - \boldsymbol{b} \| = \| Q_2^T \boldsymbol{b} \| \]

where \(A = QR\) is the QR deomposition with \(Q = [ Q_1 \ \ Q_2 ]\).

Fitting Models to Data#

Suppose we have \(m\) points

\[ (t_1,y_1) , \dots , (t_m,y_m) \]

and we want to find a line

\[ y=c_1 + c_2t \]

that “best fits” the data. There are different ways to quantify what “best fits” means but the most common method is called least squares linear regression. In least squares linear regression, we want to minimize the sum of squared errors

\[ SSE = \sum_i (y_i - (c_1 + c_2 t_i))^2 \]

In matrix notation, the sum of squared errors is

\[ SSE = \Vert \boldsymbol{y} - A \boldsymbol{c} \Vert^2 \]

where

\[\begin{split} \boldsymbol{y} = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_m \end{bmatrix} \hspace{10mm} A = \begin{bmatrix} 1 & t_1 \\ 1 & t_2 \\ \vdots & \vdots \\ 1 & t_m \end{bmatrix} \hspace{10mm} \boldsymbol{c} = \begin{bmatrix} c_1 \\ c_2 \end{bmatrix} \end{split}\]

We assume that \(m \geq 2\) and \(t_i \not= t_j\) for all \(i \not= j\) which implies \(\mathrm{rank}(A) = 2\). Therefore the vector of coefficients

\[\begin{split} \boldsymbol{c} = \begin{bmatrix} c_1 \\ c_2 \end{bmatrix} \end{split}\]

is the least squares approximation of the system \(A \boldsymbol{c} \approx \boldsymbol{y}\). See Wikipedia:Simple linear regression.

More generally, given \(m\) data points

\[ (t_1,y_1) , \dots , (t_m,y_m) \]

and a model function \(f(t,\boldsymbol{c})\) which depends on parameters \(c_1,\dots,c_n\), the least squares data fitting problem consists of computing parameters \(c_1,\dots,c_n\) which minimize the sum of squared errors

\[ SSE = \sum_i (y_i - f(t_i,\boldsymbol{c}))^2 \]

If the model function is of the form

\[ f(t,\boldsymbol{c}) = c_1 f_1(t) + \cdots + c_n f_n(t) \]

for some functions \(f_1(t),\dots,f_n(t)\) then we say the data fitting problem is linear (but note the function \(f_1,\dots,f_n\) are not necessarily linear). In the linear case, use matrix notation to write the sum of squared errors as

\[ SSE = \Vert \boldsymbol{y} - A \boldsymbol{c} \Vert^2 \]

where

\[\begin{split} \boldsymbol{y} = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_m \end{bmatrix} \hspace{10mm} A = \begin{bmatrix} f_1(t_1) & f_2(t_1) & \cdots & f_n(t_1) \\ f_1(t_2) & f_2(t_2) & \cdots & f_n(t_2) \\ \vdots & \vdots & \ddots & \vdots \\ f_1(t_m) & f_2(t_m) & \cdots & f_n(t_m) \end{bmatrix} \hspace{10mm} \boldsymbol{c} = \begin{bmatrix} c_1 \\ c_2 \\ \vdots \\ c_n \end{bmatrix} \end{split}\]

We assume that \(m \geq n\) and \(f_1,\dots,f_n\) are linearly independently (which implies \(\mathrm{rank}(A) = n\)). Therefore the vector of coefficients \(\boldsymbol{c}\) is the least squares approximation of the system \(A \boldsymbol{c} \approx \boldsymbol{y}\).

Exercises#

Let \(A = QR\) where

\[\begin{split} Q = \left[ \begin{array}{rrrrr} 0 & 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 \end{array} \right] \ \ R = \left[ \begin{array}{rrrr} 1 & 1 & 1 & 1 \\ 0 & 1 & 1 & 1 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \end{array} \right] \end{split}\]

Find the least squares approximation \(A\boldsymbol{x} \approx \boldsymbol{b}\) where

\[\begin{split} \boldsymbol{b} = \left[ \begin{array}{r} -2 \\ -1 \\ 0 \\ 1 \\ 2 \end{array} \right] \end{split}\]

Setup (but do not solve) a linear system \(A \boldsymbol{c} = \boldsymbol{y}\) where the solution is the coefficient vector

\[\begin{split} \boldsymbol{c} = \begin{bmatrix} c_0 \\ c_1 \\ c_2 \end{bmatrix} \end{split}\]

such that the function

\[ f(t) = c_0 + c_1\cos(2 \pi t) + c_2 \sin(2 \pi t) \]

bests fits the data \((0,1),(1/4,3),(1/2,2),(3/4,-1),(1,0)\).