Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Big Idea

Don’t guess how many Chebyshev coefficients NN you need to resolve a solution. Build the QR factorisation of the discretised system one column at a time, and stop as soon as the residual of the truncated solution drops below tolerance. The residual at every step is read off for free from the transformed right-hand side, and the almost-banded structure of the ultraspherical operator from §7 makes each Givens update O(1)O(1) work. The result is a single-pass solver that returns both the solution and the smallest NN that resolves it.

From Function-Space to System Adaptivity

For a known function ff, the adaptive truncation in Algorithm 1 already does what we want: sample, DCT, watch the coefficient tail, stop when it drops below tolerance. This section extends that idea to the unknown solution uu of the ultraspherical BVP system from §7, where we cannot compute the coefficient tail directly because uu is what we are solving for.

The substitute is the residual of the truncated linear system. As we build the QR factorisation of the discretised operator one column at a time, the residual of the solution truncated at column NN is read off for free from the transformed right-hand side. It plays the same role as the coefficient tail did for a known function: once the residual drops below tolerance, the current NN is sufficient and every additional column is wasted work. The almost-banded structure of the ultraspherical operator makes each Givens update O(1)O(1), so the whole sweep costs O(N)O(N).

Source

The Discretised Problem

We use the same first-order BVP as in §7:

u(x)=f(x),u(1)=α,x[1,1].u'(x) = f(x), \qquad u(-1) = \alpha, \qquad x \in [-1, 1].

After ultraspherical discretisation at size NN, the problem is the almost-banded linear system

Lc  =  g,L=(bD0),g=(αS0fT),\mathcal{L}\, \mathbf{c} \;=\; \mathbf{g}, \qquad \mathcal{L} = \begin{pmatrix} \mathbf{b}^\top \\ \mathcal{D}_0 \end{pmatrix}, \qquad \mathbf{g} = \begin{pmatrix} \alpha \\ \mathcal{S}_0 \mathbf{f}_T \end{pmatrix},

with c=(c0,,cN)\mathbf{c} = (c_0, \ldots, c_N)^\top the Chebyshev coefficients of uu. The first row is dense (the boundary condition); every other row is banded with bandwidth one (the differentiation operator). The NN we want is whatever makes the truncation error in uu smaller than the tolerance.

The naive approach picks NN in advance, solves once, hopes for the best, and starts over with 2N2N if the answer is unresolved. Adaptive QR avoids all of this.

The Residual Comes for Free

The whole adaptive solver hangs on a single observation: when you build the QR factorisation column by column with orthogonal row operations, the residual of the truncated solve is read off the rotated right-hand side at no extra cost.

State after kk elimination steps

Factor L=QR\mathcal{L} = QR incrementally: process the columns one at a time, and after kk steps we have an orthogonal QkQ_k (the product of whatever orthogonal row operations we used to triangularise the first kk columns; the choice will turn out to matter and we make it in §4). The matrix and right-hand side then take the partitioned form

QkL  =  (RkXk0Bk),g~  :=  Qkg  =  (g~1:kg~k+1:).Q_k^\top\, \mathcal{L} \;=\; \begin{pmatrix} R_k & X_k \\ 0 & B_k \end{pmatrix}, \qquad \tilde{\mathbf{g}} \;:=\; Q_k^\top\, \mathbf{g} \;=\; \begin{pmatrix} \tilde{\mathbf{g}}_{1:k} \\ \tilde{\mathbf{g}}_{k+1:} \end{pmatrix}.

We define g~:=Qkg\tilde{\mathbf{g}} := Q_k^\top\, \mathbf{g}, the rotated right-hand side, once and for all. Concretely g~\tilde{\mathbf{g}} is just g\mathbf{g} after the same orthogonal operations that triangularised the first kk columns of L\mathcal{L} are applied to its entries.

The block sizes are RkRk×kR_k \in \mathbb{R}^{k \times k} (upper triangular, the columns we have finished), XkRk×(N+1k)X_k \in \mathbb{R}^{k \times (N+1-k)} (processed rows, columns yet to be touched), BkR(N+1k)×(N+1k)B_k \in \mathbb{R}^{(N+1-k) \times (N+1-k)} (the still-unprocessed bottom block). The rotated right-hand side splits into the head g~1:k\tilde{\mathbf{g}}_{1:k} that lives next to RkR_k, and the tail g~k+1:\tilde{\mathbf{g}}_{k+1:} that has been rotated through but not yet “used” by any back-solve.

The truncated solve

Suppose we stopped now and reported a solution. The natural choice is

c^  =  (Rk1g~1:k0)    RN+1,\hat{\mathbf{c}} \;=\; \begin{pmatrix} R_k^{-1}\,\tilde{\mathbf{g}}_{1:k} \\ \mathbf{0} \end{pmatrix} \;\in\; \mathbb{R}^{N+1},

i.e., back-solve in the finished triangular block and zero the rest. The remarkable fact is that we already know how good this c^\hat{\mathbf{c}} is.

Theorem 1 (Residual from the transformed right-hand side)

The truncated solution c^\hat{\mathbf{c}} above has Euclidean residual

Lc^g2  =  g~k+1:2,\big\| \mathcal{L}\, \hat{\mathbf{c}} - \mathbf{g} \big\|_2 \;=\; \big\| \tilde{\mathbf{g}}_{k+1:} \big\|_2,

the norm of the un-touched tail of the rotated right-hand side.

Proof 1

QkQ_k is orthogonal, so it preserves Euclidean norms:

Lc^g2  =  Qk(Lc^g)2.\big\| \mathcal{L}\, \hat{\mathbf{c}} - \mathbf{g} \big\|_2 \;=\; \big\| Q_k^\top\bigl( \mathcal{L}\, \hat{\mathbf{c}} - \mathbf{g} \bigr) \big\|_2.

Compute the right-hand side using the block decomposition. Since c^k+1:=0\hat{\mathbf{c}}_{k+1:} = \mathbf{0}, only the first block-column of QkLQ_k^\top \mathcal{L} contributes:

QkLc^  =  (RkXk0Bk)(Rk1g~1:k0)  =  (g~1:k0).Q_k^\top \mathcal{L}\, \hat{\mathbf{c}} \;=\; \begin{pmatrix} R_k & X_k \\ 0 & B_k \end{pmatrix} \begin{pmatrix} R_k^{-1}\,\tilde{\mathbf{g}}_{1:k} \\ \mathbf{0} \end{pmatrix} \;=\; \begin{pmatrix} \tilde{\mathbf{g}}_{1:k} \\ \mathbf{0} \end{pmatrix}.

Subtracting QkgQ_k^\top \mathbf{g},

Qk(Lc^g)  =  (g~1:k0)    (g~1:kg~k+1:)  =  (0g~k+1:),Q_k^\top\bigl( \mathcal{L}\, \hat{\mathbf{c}} - \mathbf{g} \bigr) \;=\; \begin{pmatrix} \tilde{\mathbf{g}}_{1:k} \\ \mathbf{0} \end{pmatrix} \;-\; \begin{pmatrix} \tilde{\mathbf{g}}_{1:k} \\ \tilde{\mathbf{g}}_{k+1:} \end{pmatrix} \;=\; \begin{pmatrix} \mathbf{0} \\ -\tilde{\mathbf{g}}_{k+1:} \end{pmatrix},

whose norm is g~k+1:2\|\tilde{\mathbf{g}}_{k+1:}\|_2.

The geometry of the proof is the whole point: orthogonal transformations preserve norms and line up the partitions so that the residual lives entirely in the bottom block. We did not solve any linear system to read this number, and we did not even need to look at BkB_k. The information was banked the moment we rotated g\mathbf{g} alongside L\mathcal{L}.

In particular, the moment g~k+1:2\|\tilde{\mathbf{g}}_{k+1:}\|_2 drops below tolerance, the truncated solution at size kk is good enough, and every column to the right of kk can be left alone.

Building the QR Factorisation One Column at a Time

The orthogonal row operations we use are Givens rotations: 2×22 \times 2 orthogonal matrices

G(c,s)=(cssc),c2+s2=1,G(ab)=(a2+b20),G(c, s) = \begin{pmatrix} c & s \\ -s & c \end{pmatrix}, \qquad c^2 + s^2 = 1, \qquad G \begin{pmatrix} a \\ b \end{pmatrix} = \begin{pmatrix} \sqrt{a^2 + b^2} \\ 0 \end{pmatrix},

each acting on a pair of rows to zero one chosen entry, with the same rotation applied to g\mathbf{g}. The full mechanics — how the c,sc, s are computed, how rotations compose — are in Givens Rotations and Streaming QR. Here we only need the consequences for our almost-banded L\mathcal{L}.

Why Givens, not Householder

QR can equally be built with Householder reflectors, which annihilate a whole column below the diagonal in one shot. For dense matrices that is the textbook choice; library implementations (LAPACK geqrf) are heavily tuned and slightly more robust numerically. Asymptotically, both methods can be made to run in O((m+K)2)O((m+K)^2) per column for an almost-banded matrix, hence O(N)O(N) total. So the choice is not about which is faster in big-OO. It is about which is the right primitive for streaming: process one column, read the residual, stop or continue.

On that count Givens wins on three practical points.

  1. Each rotation is self-contained. A Givens rotation is two scalars (c,s)(c, s) and a pair of row indices (i,j)(i, j). Once it is applied, those two rows are updated and nothing else has to be tracked. Householder reflectors, in contrast, are length-ν\nu vectors stored explicitly; bringing a fresh row “up to date” means replaying the relevant reflectors against it in the right order. The bookkeeping is heavier when columns and rows arrive on the fly.

  2. The residual estimate falls out for free. Each Givens rotation is also applied to two scalar entries of g\mathbf{g}. After processing column jj the head g~1:j\tilde{\mathbf{g}}_{1:j} is locked in and the tail norm is just an accumulator update — O(1)O(1) per column for the convergence check (see the Algorithm below). The same is possible with Householder, but with a wider inner loop.

  3. Append-only growth. Adding column j+1j+1 later means a few new Givens rotations between specific pairs of rows. Nothing about columns 0,,j0, \ldots, j has to be revisited. This is exactly the streaming pattern of §8, used there for online least-squares; here we apply the same machinery to a different left-hand side.

The take-away: pick Givens because the algorithm is streaming, not because it is asymptotically faster. With proper data structures (banded storage for the working rows, dense storage for the KK boundary rows kept apart) both methods can hit O(N)O(N), but Givens is the cleaner primitive for “process one column, ask if we’re done, repeat”.

Watching the factorisation grow

The streaming algorithm in spirit looks just like funpy’s StreamingQR: a row generator that hands out rows of L\mathcal{L} on demand, an active-row buffer that is rotated through Givens, and a running residual estimate read off the rotated g\mathbf{g}. We implement a stripped-down version on the variable-coefficient problem

(1+x)u(x)+u(x)=f(x),u(1)=α,(1 + x)\, u'(x) + u(x) = f(x), \qquad u(-1) = \alpha,

a slight generalisation of the BVP from §7. Using the multiplication operator from §7, the discretised operator is

L  =  (bM1[1+x]D0+S0),\mathcal{L} \;=\; \begin{pmatrix} \mathbf{b}^\top \\[0.3ex] M_1[1+x]\,\mathcal{D}_0 + \mathcal{S}_0 \end{pmatrix},

with bk=(1)k\mathbf{b}_k = (-1)^k. One dense boundary row (K=1K = 1) on top of an interior of bandwidth two, lower bandwidth mL=1m_L = 1.

Source: ultraspherical operators D0\mathcal{D}_0, S0\mathcal{S}_0, M1[x]M_1[x], and the assembled L\mathcal{L} for (1+x)u+u=f(1+x)u' + u = f.

Source

To watch the factorisation literally grow, snapshot the working matrix after a handful of column steps. The upper-left j×jj \times j block becomes upper triangular (that is RjR_j); the rest of the matrix is either un-touched (rows below the active band) or has been rotated up into the XjX_j region above the diagonal.

Source: dense column-by-column Givens sweep with snapshots, used for visualisation only.

Source
<Figure size 1300x320 with 5 Axes>

The vertical red line marks the boundary between finished columns (left) and pending columns (right). Two features stand out. First, the upper-left j×jj \times j block is upper triangular: that is RjR_j. Second, no fill-in escapes a narrow band below the diagonal. Each new column contributes a small, bounded number of nonzeros to the working state; the cost of one step does not depend on how many columns we have already processed.

The streaming version produces the same RR one row at a time without storing the full matrix. Its pieces are: a row generator (get_row(i), get_rhs(i)) that materialises rows on demand, a full-length rotated right-hand side g~\tilde{\mathbf{g}} kept in sync with the row rotations, and a loop that pulls rows in as they are needed, rotates them, checks g~j+1:2\|\tilde{\mathbf{g}}_{j+1:}\|_2 directly, and stops when the residual drops below tolerance. Holding g~\tilde{\mathbf{g}} explicitly (rather than computing g2head2\sqrt{\|\mathbf{g}\|^2 - \|\text{head}\|^2}) avoids catastrophic cancellation when the residual approaches machine precision.

Source: streaming Givens QR. The actual algorithm.

Source

A demo. We pick a smooth manufactured solution uexact(x)=cos(8x)+0.3esin(3x)u_{\rm exact}(x) = \cos(8x) + 0.3\, e^{\sin(3x)}, compute the matching ff and α\alpha, and expose L\mathcal{L} and g\mathbf{g} to the solver via row-generator functions.

Source: demo setup, residual plot.

Source
streaming QR stopped at N_opt = 39,  residual = 8.49e-13
<Figure size 700x400 with 1 Axes>

The residual decays geometrically (the spectral rate inherited from the smoothness of ff) and crosses the tolerance well before we exhaust Nmax=200N_{\max} = 200. The solver never asked for rows beyond NoptN_{\rm opt}: those remained unfetched and unrotated. This is the whole point of the streaming pattern.

A production-grade version (funpy’s StreamingQR) layers two more ideas on top:

  1. Banded storage. The KK boundary rows live in a separate dense buffer; the active band is held in a compact (mL+mU+K)(m_L + m_U + K)-wide layout. With this, each Givens rotation does O(m+K)O(m + K) work rather than O(N)O(N), and the whole sweep is genuinely O(N(m+K)2)O(N (m+K)^2).

  2. Stored rotations. Each (i,j,c,s)(i, j, c, s) is stashed so that a new right-hand side can be rotated through the existing factorisation in one pass, no re-factoring needed.

Both extras are bookkeeping, not new ideas. The skeleton above is the algorithm; everything else is data structures.

A runnable, self-contained version of the ultraspherical operators plus a streaming solver lives in the Ultraspherical Toolbox notebook.

See Also