Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Practice QR factorization and least squares.


Q9.1: Orthogonality

Let u=(122)u = \begin{pmatrix} 1 \\ 2 \\ 2 \end{pmatrix} and v=(212)v = \begin{pmatrix} 2 \\ 1 \\ -2 \end{pmatrix}.


Q9.2: Gram-Schmidt by Hand

Apply the Gram-Schmidt algorithm to the vectors:

a1=(110),a2=(101),a3=(011)a_1 = \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix}, \quad a_2 = \begin{pmatrix} 1 \\ 0 \\ 1 \end{pmatrix}, \quad a_3 = \begin{pmatrix} 0 \\ 1 \\ 1 \end{pmatrix}

Q9.3: QR Factorization

For the matrix A=(111001)A = \begin{pmatrix} 1 & 1 \\ 1 & 0 \\ 0 & 1 \end{pmatrix}:


Q9.4: Normal Equations

Consider the least squares problem minxAxb2\min_x \|Ax - b\|_2 with:

A=(111213),b=(122)A = \begin{pmatrix} 1 & 1 \\ 1 & 2 \\ 1 & 3 \end{pmatrix}, \quad b = \begin{pmatrix} 1 \\ 2 \\ 2 \end{pmatrix}

Q9.5: QR for Least Squares

Solve the same least squares problem from Q9.4 using QR factorization:


Q9.6: Condition Number Comparison

For the polynomial fitting problem with design matrix:

Xij=tij1,ti=i1n1,i=1,,n,j=1,,k+1X_{ij} = t_i^{j-1}, \quad t_i = \frac{i-1}{n-1}, \quad i = 1, \ldots, n, \quad j = 1, \ldots, k+1

Q9.7: Ill-Conditioned Regression

Implement the “dramatic example” from the notes:

def fake_regression():
    N = 100
    k = 14
    t = np.linspace(0, 1, N)
    X = np.column_stack([t**i for i in range(k+1)])
    y = np.exp(np.sin(4*t))
    y = y / 2006.787453080206
    return X, y

Q9.8: Orthogonal Matrix Properties

Let QQ be an orthogonal matrix. Prove the following:


Q9.9: Householder Reflection


Q9.10: Polynomial Curve Fitting

Given data points (0,1),(1,2),(2,5),(3,10),(4,17)(0, 1), (1, 2), (2, 5), (3, 10), (4, 17):


Q9.11: Operator Norm of Orthogonal Matrices


Q9.12: Comparing Methods (Implementation)

Write Python functions to solve least squares three ways:

def solve_normal_equations(A, b):
    """Solve via A^T A x = A^T b"""
    pass

def solve_qr(A, b):
    """Solve via QR factorization"""
    pass

def solve_lstsq(A, b):
    """Solve using numpy.linalg.lstsq"""
    pass

Test all three on the ill-conditioned polynomial problem and compare accuracy.


Q9.13: Householder QR Implementation (optional)

Implement QR factorization using Householder reflections.

Hint

For numerical stability, choose the sign of the Householder vector to avoid cancellation:

v=x+sign(x1)x2e1v = x + \text{sign}(x_1)\|x\|_2 e_1

If x1=0x_1 = 0, use the positive sign.


Q9.14: Givens Rotations (optional)

Givens rotations zero out individual matrix entries. The rotation matrix:

G(i,j,θ)=(1cosθsinθ1sinθcosθ1)G(i, j, \theta) = \begin{pmatrix} 1 & & & & \\ & \cos\theta & & -\sin\theta & \\ & & 1 & & \\ & \sin\theta & & \cos\theta & \\ & & & & 1 \end{pmatrix}

rotates in the (i,j)(i,j) plane by angle θ\theta.

Hint

For numerical stability, use:

r=a2+b2,c=a/r,s=b/rr = \sqrt{a^2 + b^2}, \quad c = a/r, \quad s = -b/r

Handle the case r=0r = 0 specially (set c=1c = 1, s=0s = 0).


Q9.15: Partial Pivoting (optional)

In Gaussian elimination, partial pivoting swaps rows to put the largest (in absolute value) element on the diagonal. This improves numerical stability.

Hint

Without pivoting, the first step gives:

L=(1010201),U=(10201011020)L = \begin{pmatrix} 1 & 0 \\ 10^{20} & 1 \end{pmatrix}, \quad U = \begin{pmatrix} 10^{-20} & 1 \\ 0 & 1 - 10^{20} \end{pmatrix}

With pivoting, we swap rows first, giving much more reasonable values.


Self-Assessment Questions

Test your understanding with these conceptual questions:

  1. Orthogonality: If q1,q2q_1, q_2 are orthonormal, what is q1q2q_1 \cdot q_2? What is q12\|q_1\|_2?

  2. Best approximation: Why is the residual bAx^b - A\hat{x} orthogonal to the column space of AA?

  3. Gram-Schmidt instability: When does catastrophic cancellation occur in Gram-Schmidt?

  4. QR factorization: If A=QRA = QR, what are the shapes of QQ and RR for AR100×5A \in \mathbb{R}^{100 \times 5}?

  5. Condition number: If κ(A)=106\kappa(A) = 10^6, what is κ(ATA)\kappa(A^T A)? Why is this bad?

  6. QR vs normal equations: Why does QR give 7 correct digits when normal equations give none?

  7. Solving via QR: Given A=Q^R^A = \hat{Q}\hat{R}, what system do you solve for least squares?

Q4.1: Vector Norms

Compute the 1-norm, 2-norm, and \infty-norm of the following vectors:


Q4.2: Matrix Norms

Compute A1\|\mathcal{A}\|_1 and A\|\mathcal{A}\|_\infty for:


Q4.3: Condition Number Computation


Q4.4: Sensitivity Analysis

Consider the system Ax=b\mathcal{A}\mathbf{x} = \mathbf{b} where A=(1111.0001)\mathcal{A} = \begin{pmatrix} 1 & 1 \\ 1 & 1.0001 \end{pmatrix} and b=(22)\mathbf{b} = \begin{pmatrix} 2 \\ 2 \end{pmatrix}.


Q4.5: Iterative Refinement (Newton’s Method in Disguise)

When solving Ax=bA\mathbf{x} = \mathbf{b}, iterative refinement improves an approximate solution x^\hat{\mathbf{x}} by repeatedly correcting it:

1. Compute residual: r = b - A*x̂
2. Solve for correction: A*δ = r
3. Update solution: x̂ ← x̂ + δ
4. Repeat until ||r|| is small enough
Hint

For part (d), compute the LU factorization once with scipy.linalg.lu_factor, then use scipy.linalg.lu_solve for each correction. The key insight is that the expensive O(n3)O(n^3) factorization is reused, making each refinement step only O(n2)O(n^2).

For part (e), computing the residual r=bAx^\mathbf{r} = \mathbf{b} - A\hat{\mathbf{x}} in higher precision (or carefully) is important for ill-conditioned problems. In practice, LAPACK’s xGERFS routine does this.


Q4.6: Componentwise Condition Number (Skeel/Bauer)

The standard condition number κ(A)=AA1\kappa(A) = \|A\| \|A^{-1}\| can be overly pessimistic for badly scaled matrices. Consider:

A=(106001),b=(1061)A = \begin{pmatrix} 10^{-6} & 0 \\ 0 & 1 \end{pmatrix}, \quad \mathbf{b} = \begin{pmatrix} 10^{-6} \\ 1 \end{pmatrix}
Hint

For part (c), note that for a diagonal matrix A=diag(d1,,dn)A = \text{diag}(d_1, \ldots, d_n):

  • A1=diag(1/d1,,1/dn)A^{-1} = \text{diag}(1/d_1, \ldots, 1/d_n)

  • A1A=diag(1/d1d1,)=I|A^{-1}| \cdot |A| = \text{diag}(|1/d_1| \cdot |d_1|, \ldots) = I

For part (e), use (DB)1=B1D1(DB)^{-1} = B^{-1}D^{-1} and the fact that D1D=I|D^{-1}| \cdot |D| = I for diagonal DD.


Q4.7: Row Equilibration

Row equilibration scales each row of AA so that all rows have comparable norms. Specifically, if we define the diagonal matrix:

Dr=diag(1a1,1a2,,1an)D_r = \text{diag}\left(\frac{1}{\|a_1\|_\infty}, \frac{1}{\|a_2\|_\infty}, \ldots, \frac{1}{\|a_n\|_\infty}\right)

where aiTa_i^T is the ii-th row of AA, then A~=DrA\tilde{A} = D_r A has a~i=1\|\tilde{a}_i\|_\infty = 1 for all rows.

Hint

For part (c), recall that κCR(A)=A1A\kappa_{\text{CR}}(A) = \||A^{-1}| \cdot |A|\|. For a diagonal matrix DD with positive diagonal entries, we have D=D|D| = D and D1=D1|D^{-1}| = D^{-1}, so D1D=I|D^{-1}||D| = I.


Q4.8: Row and Column Equilibration

For highly ill-conditioned matrices, we can apply scaling to both rows and columns:

A~=DrADc\tilde{A} = D_r A D_c

where DrD_r scales rows and DcD_c scales columns. The transformed system becomes:

A~y=b~,where b~=Drb and x=Dcy\tilde{A}\mathbf{y} = \tilde{\mathbf{b}}, \quad \text{where } \tilde{\mathbf{b}} = D_r \mathbf{b} \text{ and } \mathbf{x} = D_c \mathbf{y}
Hint

For column equilibration, you can use the \infty-norm of each column:

D_c = np.diag(1.0 / np.max(np.abs(A), axis=0))

For part (c), you need to compute A1A|A^{-1}||A| before and after column scaling. The column scaling changes the product in a non-trivial way.


Self-Assessment Questions

Test your understanding with these conceptual questions:

  1. Invertibility: If det(A)=0\det(\mathcal{A}) = 0, what can you say about the solution of Ax=b\mathcal{A}\mathbf{x} = \mathbf{b}?

  2. Norms: For x=(1,2,3)T\mathbf{x} = (1, -2, 3)^T, compute x1\|\mathbf{x}\|_1, x2\|\mathbf{x}\|_2, and x\|\mathbf{x}\|_\infty.

  3. Condition Number: If κ(A)=108\kappa(\mathcal{A}) = 10^8, approximately how many digits of accuracy can you expect in double precision?

  4. Equilibration: Why does row scaling not change the componentwise condition number, but column scaling can?

  5. Iterative Refinement: Why do we compute the residual in higher precision (or carefully) for ill-conditioned problems?