Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Background

This section reviews material from linear algebra (Math 235): vectors, matrices, and norms. It then introduces the condition number, which quantifies how sensitive a linear system is to perturbations. The emphasis is on the computational perspective: how do we measure size, and how does that connect to the accuracy of numerical algorithms?

Big Idea

Vectors are the objects we compute with; matrices are linear functions that act on vectors. To analyze algorithms, we need to measure size: norms for vectors, induced norms for matrices. The condition number κ(A)=AA1\kappa(A) = \|A\|\|A^{-1}\| measures the ratio of maximum to minimum stretch of unit vectors, and tells us how sensitive a linear system is to perturbations.

Vectors and Vector Spaces

The fundamental objects in numerical linear algebra are vectors, the elements of a vector space.

Definition 1 (Vector Space)

A vector space VV over R\mathbb{R} is a set with two operations:

  • Addition: x+yV\mathbf{x} + \mathbf{y} \in V for x,yV\mathbf{x}, \mathbf{y} \in V

  • Scalar multiplication: αxV\alpha\mathbf{x} \in V for αR\alpha \in \mathbb{R}, xV\mathbf{x} \in V

satisfying the usual axioms (associativity, commutativity, distributivity, zero element, inverses).

The canonical example is Rn\mathbb{R}^n, the space of column vectors with nn real components. But vector spaces are far more general:

Example 1 (Examples of Vector Spaces)

SpaceElementsDimension
Rn\mathbb{R}^nColumn vectorsnn
Cn\mathbb{C}^nComplex vectorsnn (over C\mathbb{C})
Pn\mathcal{P}_nPolynomials of degree n\leq nn+1n+1
C[a,b]C[a,b]Continuous functions on [a,b][a,b]\infty
L2[a,b]L^2[a,b]Square-integrable functions\infty

The same linear algebra concepts (basis, dimension, linear maps, norms) apply to all these spaces. When you solve a PDE numerically, you’re doing linear algebra in a function space. The finite-dimensional theory (Rn\mathbb{R}^n) is the template for the infinite-dimensional theory (functional analysis).

This is why we emphasize the abstract structure: vectors are elements of a vector space, matrices are linear maps. The specifics of Rn\mathbb{R}^n are just one instance.

Vector Norms

To analyze errors and convergence, we need to measure the “size” of vectors.

A norm :VR\|\cdot\|: V \to \mathbb{R} on a vector space VV satisfies:

  1. x0\|\mathbf{x}\| \geq 0 with equality iff x=0\mathbf{x} = \mathbf{0} (positive definiteness)

  2. αx=αx\|\alpha\mathbf{x}\| = |\alpha|\|\mathbf{x}\| (homogeneity)

  3. x+yx+y\|\mathbf{x} + \mathbf{y}\| \leq \|\mathbf{x}\| + \|\mathbf{y}\| (triangle inequality)

A vector space equipped with a norm is called a normed vector space. If it’s also complete (Cauchy sequences converge), it’s a Banach space, the natural setting for analysis.

The pp-Norms on Rn\mathbb{R}^n

xp=(i=1nxip)1/p\|\mathbf{x}\|_p = \left(\sum_{i=1}^n |x_i|^p\right)^{1/p}
NameFormulaInterpretation
1-normx1=ixi|\mathbf{x}|_1 = \sum_i \lvert x_i \rvertManhattan distance
2-normx2=ixi2|\mathbf{x}|_2 = \sqrt{\sum_i x_i^2}Euclidean length
\infty-normx=maxixi|\mathbf{x}|_\infty = \max_i \lvert x_i \rvertMaximum component

Norm Equivalence

Theorem 1 (Norm Equivalence)

All norms on Rn\mathbb{R}^n are equivalent: for any two norms a\|\cdot\|_a and b\|\cdot\|_b, there exist constants c,C>0c, C > 0 such that:

cxaxbCxafor all xc\|\mathbf{x}\|_a \leq \|\mathbf{x}\|_b \leq C\|\mathbf{x}\|_a \quad \text{for all } \mathbf{x}

This is a finite-dimensional phenomenon. In infinite dimensions (function spaces), different norms can give genuinely different notions of convergence, a key subtlety in PDE theory.

Function Space Norms

The same idea extends to functions:

SpaceNormFormula
C[a,b]C[a,b]Supremum normf=maxx[a,b]f(x)|f|_\infty = \max_{x \in [a,b]} \lvert f(x) \rvert
L2[a,b]L^2[a,b]L2L^2 normf2=abf(x)2dx|f|_2 = \sqrt{\int_a^b \lvert f(x) \rvert^2 dx}
Lp[a,b]L^p[a,b]LpL^p normfp=(abf(x)pdx)1/p|f|_p = \left(\int_a^b \lvert f(x) \rvert^p dx\right)^{1/p}

These are the continuous analogs of the discrete pp-norms: sums become integrals.

Matrices as Linear Maps

Matrices are linear functions between vector spaces. A matrix ARm×nA \in \mathbb{R}^{m \times n} defines:

TA:RnRm,TA(x)=AxT_A: \mathbb{R}^n \to \mathbb{R}^m, \qquad T_A(\mathbf{x}) = A\mathbf{x}

Linearity means:

Every linear map RnRm\mathbb{R}^n \to \mathbb{R}^m corresponds to a unique m×nm \times n matrix, and vice versa.

The Bigger Picture

In infinite dimensions, linear maps between function spaces are called operators. Differential operators (d/dxd/dx), integral operators (K(x,y)f(y)dy\int K(x,y) f(y) dy), and solution operators for PDEs are all linear maps, the infinite-dimensional analogs of matrices.

The Matrix-Vector Product

Given ARm×nA \in \mathbb{R}^{m \times n} and xRn\mathbf{x} \in \mathbb{R}^n:

(Ax)i=j=1naijxj,i=1,,m(A\mathbf{x})_i = \sum_{j=1}^{n} a_{ij} x_j, \quad i = 1, \ldots, m

Cost: 2mn2mn floating-point operations.

Two views:

Row ViewColumn View
Each (Ax)i(A\mathbf{x})_i is a dot product: aiTx\mathbf{a}_i^T \cdot \mathbf{x}AxA\mathbf{x} is a linear combination: jxja(j)\sum_j x_j \mathbf{a}^{(j)}

The column view reveals that AxA\mathbf{x} lives in the column space (range) of AA.

Geometric Interpretation

Matrix TypeGeometric Effect
DiagonalScaling along coordinate axes
Orthogonal (QTQ=IQ^TQ = I)Rotation and/or reflection
SymmetricScaling along eigenvector directions

Matrix Norms

Since matrices are linear maps, we measure their size by how much they “stretch” vectors.

Definition 3 (Induced (Operator) Norm)

A=maxx0Axx=maxx=1Ax\|A\| = \max_{\mathbf{x} \neq \mathbf{0}} \frac{\|A\mathbf{x}\|}{\|\mathbf{x}\|} = \max_{\|\mathbf{x}\| = 1} \|A\mathbf{x}\|

The maximum stretching factor over all unit vectors.

This definition works for any linear map between normed spaces. It is how we measure operators in functional analysis too.

NameFormulaComputation
1-normA1=maxjiaij|A|_1 = \max_j \sum_i \lvert a_{ij} \rvertMaximum column sum
\infty-normA=maxijaij|A|_\infty = \max_i \sum_j \lvert a_{ij} \rvertMaximum row sum
2-normA2=σmax(A)|A|_2 = \sigma_{\max}(A)Largest singular value

Key properties:

Note

The 1-norm and \infty-norm are cheap (just sums). The 2-norm requires singular values, which is more expensive but geometrically natural.

The Condition Number

The norm A\|A\| measures the maximum stretching factor over all unit vectors. What about the minimum?

Theorem 2 (Stretch Interpretation of A\|A\| and A1\|A^{-1}\|)

Let AA be a nonsingular matrix. Then

A=maxx=1AxandA1=1minx=1Ax\|A\| = \max_{\|\mathbf{x}\| = 1} \|A\mathbf{x}\| \qquad \text{and} \qquad \|A^{-1}\| = \frac{1}{\displaystyle \min_{\|\mathbf{x}\| = 1} \|A\mathbf{x}\|}

A\|A\| is the maximum stretch of a unit vector by the linear transformation AA, and A1\|A^{-1}\| is the reciprocal of the minimum stretch of a unit vector.

Proof 1

The first identity follows from Ax/x=A(x/x)\|A\mathbf{x}\| / \|\mathbf{x}\| = \|A(\mathbf{x}/\|\mathbf{x}\|)\|. For the second, rearrange the definition of A1\|A^{-1}\|:

A1=maxx0A1xx=maxx0A1AxAx=maxx0xAx=maxx=11Ax=1minx=1Ax\begin{align} \|A^{-1}\| &= \max_{\mathbf{x} \neq \mathbf{0}} \frac{\|A^{-1}\mathbf{x}\|}{\|\mathbf{x}\|} = \max_{\mathbf{x} \neq \mathbf{0}} \frac{\|A^{-1}A\mathbf{x}\|}{\|A\mathbf{x}\|} = \max_{\mathbf{x} \neq \mathbf{0}} \frac{\|\mathbf{x}\|}{\|A\mathbf{x}\|} \\ &= \max_{\|\mathbf{x}\| = 1} \frac{1}{\|A\mathbf{x}\|} = \frac{1}{\displaystyle \min_{\|\mathbf{x}\| = 1} \|A\mathbf{x}\|} \end{align}

Definition 4 (Condition Number)

The condition number of a nonsingular square matrix AA is

κ(A)=AA1\kappa(A) = \|A\| \, \|A^{-1}\|

By convention, κ(A)=\kappa(A) = \infty if AA is singular.

By Theorem 2, the condition number has a clean geometric meaning:

κ(A)=maximum stretch of a unit vectorminimum stretch of a unit vector\kappa(A) = \frac{\text{maximum stretch of a unit vector}}{\text{minimum stretch of a unit vector}}

A matrix with κ(A)1\kappa(A) \approx 1 stretches all directions roughly equally (like a rotation or uniform scaling). A matrix with κ(A)1\kappa(A) \gg 1 stretches some directions far more than others, making the linear system sensitive to perturbations.

Geometric examples

Example 2 (Reading κ(A)\kappa(A) from a figure)

The image below shows the unit circle (left) and its image under a 2×22 \times 2 matrix AA (right).

The unit circle (left) mapped to an ellipse under A (right). The semi-axes of the ellipse are the singular values of A: the maximum stretch is \|A\| = 3/\sqrt{2} and the minimum stretch is 1/\sqrt{2}.

Figure 1:The unit circle (left) mapped to an ellipse under AA (right). The semi-axes of the ellipse are the singular values of AA: the maximum stretch is A=3/2\|A\| = 3/\sqrt{2} and the minimum stretch is 1/21/\sqrt{2}.

The maximum stretch is A=3/2\|A\| = 3/\sqrt{2}. The minimum stretch is 1/21/\sqrt{2}, so A1=2\|A^{-1}\| = \sqrt{2}. The condition number is

κ(A)=3/21/2=3\kappa(A) = \frac{3/\sqrt{2}}{1/\sqrt{2}} = 3

This matrix is well-conditioned: the ellipse is only moderately elongated.


:::{prf:example} A more ill-conditioned matrix
:label: ex-condition-stretch2

Now a different $2 \times 2$ matrix:

:::{figure} /img/condition_stretch2.png
:label: fig-condition-stretch2

The unit circle (left) mapped to a more elongated ellipse under $A$ (right). The greater elongation reflects a larger ratio of maximum to minimum stretch, giving a higher condition number.

The image is a much more elongated ellipse. From the figure, the maximum stretch is A=52\|A\| = 5\sqrt{2} and the minimum stretch is 222\sqrt{2}, giving

κ(A)=5222=52\kappa(A) = \frac{5\sqrt{2}}{2\sqrt{2}} = \frac{5}{2}

## Linear Systems

We seek to solve $A\mathbf{x} = \mathbf{b}$ where
$A \in \mathbb{R}^{n \times n}$ and
$\mathbf{x}, \mathbf{b} \in \mathbb{R}^n$. The basic question is: when does a
unique solution exist?

:::{prf:theorem} Invertibility Conditions
:label: thm-invertibility-conditions

For a matrix $A \in \mathbb{R}^{n \times n}$, the following are equivalent:

1. $A$ is **invertible** (i.e., $A^{-1}$ exists)
2. $A\mathbf{x} = \mathbf{b}$ has a **unique solution** for each $\mathbf{b}$
3. $A\mathbf{x} = \mathbf{0}$ has only the **trivial solution** $\mathbf{x} = \mathbf{0}$
4. $\det(A) \neq 0$
5. All **eigenvalues** of $A$ are non-zero

Existence and uniqueness are settled by this theorem. The numerical questions are: how do we solve the system efficiently? and how sensitive is the solution to perturbations? The rest of this chapter develops the tools to answer both.