Practice QR factorization and least squares.
Q9.1: Orthogonality¶
Let and .
(a) Compute . Are and orthogonal?
(b) Normalize to get a unit vector .
(c) Compute the projection of onto .
Q9.2: Gram-Schmidt by Hand¶
Apply the Gram-Schmidt algorithm to the vectors:
(a) Compute the orthonormal basis .
(b) Verify that .
(c) Write the QR factorization where .
Q9.3: QR Factorization¶
For the matrix :
(a) Compute the reduced QR factorization by hand.
(b) Verify that .
(c) Use Python to check your answer with
numpy.linalg.qr.
Q9.4: Normal Equations¶
Consider the least squares problem with:
(a) Form the normal equations .
(b) Solve for .
(c) Compute the residual and verify that .
Q9.5: QR for Least Squares¶
Solve the same least squares problem from Q9.4 using QR factorization:
(a) Compute the QR factorization of .
(b) Solve by back substitution.
(c) Verify you get the same answer as Q9.4.
Q9.6: Condition Number Comparison¶
For the polynomial fitting problem with design matrix:
(a) Write Python code to construct for and .
(b) Compute and using
numpy.linalg.cond.(c) Verify that .
Q9.7: Ill-Conditioned Regression¶
Implement the “dramatic example” from the notes:
def fake_regression():
N = 100
k = 14
t = np.linspace(0, 1, N)
X = np.column_stack([t**i for i in range(k+1)])
y = np.exp(np.sin(4*t))
y = y / 2006.787453080206
return X, y(a) Solve for regression coefficients using the normal equations.
(b) Solve using
numpy.linalg.qrand back substitution.(c) Compare (Python index 14) to the true value of 1. Which method is more accurate?
(d) Explain the difference in terms of condition numbers.
Q9.8: Orthogonal Matrix Properties¶
Let be an orthogonal matrix. Prove the following:
(a) for all .
(b) If and are orthogonal, then is orthogonal.
(c) The eigenvalues of have absolute value 1.
Q9.9: Householder Reflection¶
(a) For , write the Householder matrix .
(b) Show that reflects vectors across the -axis.
(c) Find the Householder reflector that maps to .
Q9.10: Polynomial Curve Fitting¶
Given data points :
(a) Set up the least squares problem for fitting a quadratic .
(b) Solve using QR factorization (by hand or using Python).
(c) Plot the data points and your fitted curve.
(d) What is the sum of squared residuals?
Q9.11: Operator Norm of Orthogonal Matrices¶
(a) Show that for any orthogonal matrix .
(b) Show that .
(c) Conclude that .
Q9.12: Comparing Methods (Implementation)¶
Write Python functions to solve least squares three ways:
def solve_normal_equations(A, b):
"""Solve via A^T A x = A^T b"""
pass
def solve_qr(A, b):
"""Solve via QR factorization"""
pass
def solve_lstsq(A, b):
"""Solve using numpy.linalg.lstsq"""
passTest all three on the ill-conditioned polynomial problem and compare accuracy.
Q9.13: Householder QR Implementation (optional)¶
Implement QR factorization using Householder reflections.
(a) Write a function to compute the Householder vector such that maps to :
def householder_vector(x): """Return Householder vector v such that Hx = ||x||*e1.""" pass(b) Write the full Householder QR factorization:
def householder_qr(A): """Compute QR factorization using Householder reflections. Returns: Q: Orthogonal matrix R: Upper triangular matrix """ pass(c) Test your implementation against
numpy.linalg.qron random matrices.(d) Compare the numerical stability of your Householder QR to classical Gram-Schmidt on the ill-conditioned polynomial fitting problem from Q9.7.
Hint
For numerical stability, choose the sign of the Householder vector to avoid cancellation:
If , use the positive sign.
Q9.14: Givens Rotations (optional)¶
Givens rotations zero out individual matrix entries. The rotation matrix:
rotates in the plane by angle .
(a) Given and , derive formulas for and such that: $$
$$
(b) Implement a function to compute and :
def givens_rotation(a, b): """Compute c, s such that [c -s; s c] @ [a; b] = [r; 0].""" pass(c) Implement QR factorization using Givens rotations:
def givens_qr(A): """Compute QR factorization using Givens rotations.""" pass(d) When are Givens rotations preferred over Householder? (Hint: think about sparse matrices.)
Hint
Q9.15: Partial Pivoting (optional)¶
In Gaussian elimination, partial pivoting swaps rows to put the largest (in absolute value) element on the diagonal. This improves numerical stability.
(a) Implement LU factorization with partial pivoting:
def lu_pivot(A): """LU factorization with partial pivoting. Returns: P: Permutation matrix (or permutation vector) L: Lower triangular with 1s on diagonal U: Upper triangular Such that PA = LU. """ pass(b) Test on the matrix:
Compare LU without pivoting vs. with pivoting. What happens to the computed solution of for ?
(c) Explain why the element becomes very large without pivoting, leading to large errors.
(d) The growth factor is . For the matrix above, compute with and without pivoting.
Hint
Without pivoting, the first step gives:
With pivoting, we swap rows first, giving much more reasonable values.
Self-Assessment Questions¶
Test your understanding with these conceptual questions:
Orthogonality: If are orthonormal, what is ? What is ?
Best approximation: Why is the residual orthogonal to the column space of ?
Gram-Schmidt instability: When does catastrophic cancellation occur in Gram-Schmidt?
QR factorization: If , what are the shapes of and for ?
Condition number: If , what is ? Why is this bad?
QR vs normal equations: Why does QR give 7 correct digits when normal equations give none?
Solving via QR: Given , what system do you solve for least squares?
Q4.1: Vector Norms¶
Compute the 1-norm, 2-norm, and -norm of the following vectors:
(a)
(b)
(c)
Q4.2: Matrix Norms¶
Compute and for:
(a)
(b)
Q4.3: Condition Number Computation¶
(a) Compute the condition number for .
(b) Use NumPy to compute the condition number of the Hilbert matrix.
(c) For what size Hilbert matrix does the condition number exceed in double precision?
Q4.4: Sensitivity Analysis¶
Consider the system where and .
(a) Solve the system.
(b) Compute the condition number.
(c) Perturb by 0.0001 and solve again. How much does the solution change?
(d) Is this consistent with the condition number?
Q4.5: Iterative Refinement (Newton’s Method in Disguise)¶
When solving , iterative refinement improves an approximate solution by repeatedly correcting it:
1. Compute residual: r = b - A*x̂
2. Solve for correction: A*δ = r
3. Update solution: x̂ ← x̂ + δ
4. Repeat until ||r|| is small enough(a) Let . Show that finding a root of is equivalent to solving .
(b) Write Newton’s method for solving . Recall that Newton’s method is:
What is the Jacobian for a linear function?
(c) Show that Newton’s method applied to gives exactly the iterative refinement algorithm above.
(d) Implementation: Write Python code for iterative refinement. Use the LU factorization of (computed once) to solve for each correction .
def iterative_refinement(A, b, x0, tol=1e-12, max_iter=10): """Improve solution x0 using iterative refinement.""" # Your code here pass(e) Test on an ill-conditioned system. Create a Hilbert matrix, solve where (so the true solution is all ones), and apply iterative refinement. How many iterations are needed?
Hint
For part (d), compute the LU factorization once with scipy.linalg.lu_factor, then use scipy.linalg.lu_solve for each correction. The key insight is that the expensive factorization is reused, making each refinement step only .
For part (e), computing the residual in higher precision (or carefully) is important for ill-conditioned problems. In practice, LAPACK’s xGERFS routine does this.
Q4.6: Componentwise Condition Number (Skeel/Bauer)¶
The standard condition number can be overly pessimistic for badly scaled matrices. Consider:
(a) Compute . Is this matrix “ill-conditioned” by this measure?
(b) Solve . What is the exact solution? Now perturb by relative errors of size in each entry (i.e., replace with where ). How much does change?
(c) The componentwise relative condition number is defined as:
where denotes the matrix of absolute values. Compute for the diagonal matrix above.
(d) General principle: Show that for any diagonal matrix , we have . This explains why diagonal systems are always “easy” even when is large.
(e) More generally, if is diagonal and is any invertible matrix, show that:
This means row scaling doesn’t affect the componentwise condition number.
Hint
For part (c), note that for a diagonal matrix :
For part (e), use and the fact that for diagonal .
Q4.7: Row Equilibration¶
Row equilibration scales each row of so that all rows have comparable norms. Specifically, if we define the diagonal matrix:
where is the -th row of , then has for all rows.
(a) Consider the badly scaled matrix:
Compute . Then compute the row equilibration matrix , form , and compute .
(b) Proof: Show that for any invertible diagonal matrix , the scaled system has the same solution as . Why does this mean row equilibration doesn’t change the mathematical problem?
(c) Proof: Show that row equilibration does not change the componentwise condition number:
(Hint: Use the result from Q4.6(e).)
(d) Implementation: Write a Python function for row equilibration:
def row_equilibrate(A): """ Compute row equilibration of A. Returns: A_eq: Row-equilibrated matrix D_r: Diagonal scaling matrix (as 1D array) """ # Your code here pass(e) Test your implementation on the matrix from part (a). Solve where both with and without equilibration. Which gives a more accurate solution?
Hint
For part (c), recall that . For a diagonal matrix with positive diagonal entries, we have and , so .
Q4.8: Row and Column Equilibration¶
For highly ill-conditioned matrices, we can apply scaling to both rows and columns:
where scales rows and scales columns. The transformed system becomes:
(a) Proof: Verify that if with and , then satisfies .
(b) Consider the matrix:
Compute . Now apply row equilibration only (as in Q4.7) and compute the new condition number. Finally, apply column equilibration to the result and compute the final condition number. How much did equilibration help?
(c) Proof: Unlike row scaling, column scaling can change the componentwise condition number. Show by example that in general.
(Hint: Try and .)
(d) Implementation: Write a complete equilibration solver:
def solve_equilibrated(A, b): """ Solve Ax = b using row and column equilibration. Steps: 1. Compute row scaling D_r (as in Q4.7) 2. Apply row scaling: A1 = D_r @ A, b1 = D_r @ b 3. Compute column scaling D_c for A1 4. Apply column scaling: A2 = A1 @ D_c 5. Solve A2 @ y = b1 6. Recover x = D_c @ y Returns: x: Solution to original system kappa_original: Condition number of A kappa_equilibrated: Condition number of equilibrated A """ # Your code here pass(e) Experiment: Generate a random ill-conditioned matrix using:
def make_ill_conditioned(n, target_kappa): """Create an n×n matrix with condition number ≈ target_kappa.""" U, _ = np.linalg.qr(np.random.randn(n, n)) V, _ = np.linalg.qr(np.random.randn(n, n)) s = np.logspace(0, -np.log10(target_kappa), n) return U @ np.diag(s) @ V.TFor and
target_kappa = 1e12:Solve directly using
numpy.linalg.solveSolve using your equilibration routine
Compare the errors when the true solution is
Hint
For column equilibration, you can use the -norm of each column:
D_c = np.diag(1.0 / np.max(np.abs(A), axis=0))For part (c), you need to compute before and after column scaling. The column scaling changes the product in a non-trivial way.
Self-Assessment Questions¶
Test your understanding with these conceptual questions:
Invertibility: If , what can you say about the solution of ?
Norms: For , compute , , and .
Condition Number: If , approximately how many digits of accuracy can you expect in double precision?
Equilibration: Why does row scaling not change the componentwise condition number, but column scaling can?
Iterative Refinement: Why do we compute the residual in higher precision (or carefully) for ill-conditioned problems?