Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Big Idea

Smoothness is compressibility. The smoother ff is in physical space, the more localised its Chebyshev (equivalently, Fourier) coefficients are in frequency space, so a handful of coefficients already captures the function to high accuracy. This is one face of the uncertainty principle: what is spread out in xx is localised in kk, and vice versa.

Quantitatively: analytic functions give geometric coefficient decay ck=O(ρk)|c_k| = O(\rho^{-k}), CrC^r functions give algebraic decay ck=O(kr1)|c_k| = O(k^{-r-1}), and discontinuities give no decay at all and produce Gibbs oscillations that do not vanish as nn \to \infty. Reading a coefficient-decay plot is therefore reading the regularity of ff, and deciding how many coefficients are worth keeping.

Four Representative Functions

Before stating any theorem, consider four test functions on [1,1][-1, 1] chosen so that their regularities span the spectrum from analytic to discontinuous:

  1. sin(5x)\sin(5 x) is entire: analytic on all of C\mathbb{C}. It has no singularities anywhere in the complex plane.

  2. 1/(1+25x2)1/(1 + 25 x^2) is real-analytic on [1,1][-1, 1] but has two complex poles at ±i/5\pm i/5, sitting close to the real interval.

  3. x3|x|^3 belongs to C2([1,1])C^2([-1,1]): its first and second derivatives are continuous, but the third derivative jumps at x=0x = 0.

  4. sign(x)\mathrm{sign}(x) is discontinuous at x=0x = 0. It is of bounded variation but has no derivatives at all in the usual sense.

We compute the Chebyshev interpolant of each at n=1024n = 1024 and plot ck|c_k| on a log-y axis.

Source
<Figure size 1000x650 with 4 Axes>

The four panels show qualitatively different behaviour. The two analytic examples decay geometrically, visible as straight lines on the log-y axis, though with very different slopes. The C2C^2 example x3|x|^3 decays more slowly and curves rather than running straight. The discontinuous sign function barely decays at all. The theorems below explain the rates and when each regime applies.

The Coefficient Decay Theorem

For Lipschitz ff on [1,1][-1,1] the Chebyshev series

f(x)=k=0ckTk(x),ck=2π11f(x)Tk(x)1x2dx(k1)f(x) = \sum_{k=0}^\infty c_k T_k(x), \qquad c_k = \frac{2}{\pi} \int_{-1}^{1} \frac{f(x) T_k(x)}{\sqrt{1-x^2}}\, dx \quad (k \ge 1)

converges uniformly. The interesting question is how fast.

Both statements of the algebraic theorem below use two regularity notions that are standard in real analysis but deserve a quick reminder.

Definition 1 (Bounded variation and absolute continuity)

A function f:[a,b]Rf : [a, b] \to \mathbb{R} has bounded variation VV if

Vab(f)  =  supi=1Nf(ti)f(ti1)  <  ,V_a^b(f) \;=\; \sup \sum_{i=1}^N |f(t_i) - f(t_{i-1})| \;<\; \infty,

where the supremum ranges over all finite partitions a=t0<t1<<tN=ba = t_0 < t_1 < \cdots < t_N = b. Intuitively, VV is the total up-and-down distance that the graph of ff traces out.

ff is absolutely continuous on [a,b][a, b] if there exists an integrable function gg with

f(x)  =  f(a)+axg(t)dt,x[a,b].f(x) \;=\; f(a) + \int_a^x g(t)\, dt, \qquad x \in [a, b].

This gg is the derivative ff' at almost every point. Absolute continuity is the regularity class in which the fundamental theorem of calculus holds.

Every absolutely continuous ff is of bounded variation, with V=abf(t)dtV = \int_a^b |f'(t)|\, dt. Every C1C^1 function is absolutely continuous.

Theorem 1 (Algebraic decay)

If f,f,,f(r1)f, f', \ldots, f^{(r-1)} are absolutely continuous on [1,1][-1,1] and f(r)f^{(r)} has bounded variation VV, then for k>rk > r

ck2Vπ(kr)r+1.|c_k| \le \frac{2V}{\pi (k-r)^{r+1}}.

Proof 1

Sketch. Substituting x=cosθx = \cos\theta writes the Chebyshev coefficient as a Fourier cosine integral,

ck=2π0πf(cosθ)cos(kθ)dθ.c_k = \frac{2}{\pi} \int_0^\pi f(\cos\theta)\, \cos(k\theta)\, d\theta.

The integrand is the product of the 2π2\pi-periodic even function g(θ)=f(cosθ)g(\theta) = f(\cos\theta) with cos(kθ)\cos(k\theta). One integration by parts gives

0πg(θ)cos(kθ)dθ=[g(θ)sin(kθ)k]0π1k0πg(θ)sin(kθ)dθ,\int_0^\pi g(\theta) \cos(k\theta)\, d\theta = \left[\frac{g(\theta) \sin(k\theta)}{k}\right]_0^\pi - \frac{1}{k}\int_0^\pi g'(\theta) \sin(k\theta)\, d\theta,

and the boundary term vanishes because sin(kπ)=sin(0)=0\sin(k\pi) = \sin(0) = 0. So the integral has gained a factor of 1/k1/k at the cost of one derivative on gg. Iterating rr more times uses the symmetry of gg across θ=0,π\theta = 0, \pi to kill successive boundary terms, and transfers rr derivatives in total onto gg. The final step bounds the remaining integrand by the total variation of f(r)f^{(r)} via a Riemann–Stieltjes estimate. Full details are in Trefethen (2013), Ch. 7.

The krk - r in the denominator is a shift that just reflects the finite number of initial indices krk \le r where the rr-fold integration by parts has not yet taken hold. For large kk it is negligible: krkk - r \sim k, and the bound reads ckV/kr+1|c_k| \lesssim V / k^{r+1}, the rate one carries around in practice. For kk just above rr the denominator is tiny and the bound is uninformative, but those first few coefficients never drive the approximation error anyway. Once the function is resolved (see the adaptive-NN algorithm at the end of this chapter), kk is large enough that the distinction between kr+1k^{r+1} and (kr)r+1(k - r)^{r+1} disappears into the O()O(\cdot).

Theorem 2 (Geometric decay)

If ff is analytic on [1,1][-1,1] and extends analytically to the Bernstein ellipse Eρ\mathcal{E}_\rho (foci ±1\pm 1, semi-axis sum ρ>1\rho > 1), with fM|f| \le M there, then

ck2Mρk.|c_k| \le 2 M \rho^{-k}.

The geometric case is proved by shifting the contour in the defining integral off of [1,1][-1,1] and onto Eρ\mathcal{E}_\rho in the complex plane; see Trefethen (2013), Ch. 8.

Smoothness Is Compression

The decay theorems above have a conceptual payoff worth dwelling on before moving to error bounds. The uncertainty principle says that a function cannot be simultaneously localised in physical space (xx) and in frequency space (kk). Narrow in one means spread in the other. For Chebyshev (or Fourier) series this is visible: watch the spectrum of a Gaussian bump as we shrink its width σ\sigma.

Source
<Figure size 1000x720 with 6 Axes>

As the bump narrows, its spectrum spreads. A wide, slowly-varying bump (σ=0.5\sigma = 0.5) is captured by the first few dozen Chebyshev coefficients. A thin, nearly-delta bump (σ=0.02\sigma = 0.02) spreads its coefficients across the full range of kk we sampled. In the limit σ0\sigma \to 0 the bump becomes a delta and its spectrum is flat, with no compressible tail at all. Concretely, the spatial scale σ\sigma on the left is paired with a spectral scale of order 1/σ1/\sigma on the right: shrinking σ\sigma by a factor of 25 (from 0.5 to 0.02) spreads the spectrum by roughly the same factor. Localisation in xx is paid for with spreading in kk.

Put together with the decay theorems, this is a direct compression picture. The bump-and-spectrum trade-off above was localisation of the function value. The same trade-off applies more generally to features of the function such as its derivatives. The more derivatives ff has, the less sharp any local feature can be, and the more compressible the spectrum becomes. An analytic ff on [1,1][-1, 1] has no sharp features at any order and is captured to machine precision by O(log(1/ε)/logρ)O(\log(1/\varepsilon)/\log\rho) Chebyshev coefficients, a few dozen numbers for ε=1016\varepsilon = 10^{-16}. A CrC^r function has bounded derivatives up to order rr but a kink at order r+1r{+}1, and needs O(ε1/r)O(\varepsilon^{-1/r}) coefficients. A jump is the extreme case: zero width, so the derivative is a delta, and the bump-and-spectrum logic applied to the derivative says the spectrum spreads maximally. Discontinuous functions are not compressible in this basis at all. The concrete picture of this failure mode is the Gibbs phenomenon, which we look at in §The Gibbs Phenomenon after we have the approximation-error bounds in hand.

Where the compression count comes from

The coefficient counts above look like estimates on computed quantities, but they are genuinely a priori: they depend only on regularity data about ff (the Bernstein parameter ρ\rho for analytic ff, or the derivative order rr and total variation VV for CrC^r functions), not on any computed coefficients. The chain of implication is

  1. Assume regularity of ff.

  2. Decay theorems Theorem 1 and Theorem 2 bound ck|c_k| from the regularity data.

  3. Uniform-error theorem Theorem 3 bounds fPnf\|f - P_n f\|_\infty by the tail k>nck\sum_{k > n} |c_k|.

  4. Sum the tail with the decay bound to get a closed-form error bound as a function of nn.

  5. Invert: given tolerance ε\varepsilon, solve for the smallest NN for which step 4’s bound lies below ε\varepsilon.

For the analytic case, step 4 gives fPNf2Mρ1ρN\|f - P_N f\|_\infty \le \frac{2 M}{\rho - 1}\rho^{-N}, and step 5 asks ρNε(ρ1)/(2M)\rho^{-N} \le \varepsilon (\rho - 1)/(2 M), i.e.

N    log ⁣(2M/[ε(ρ1)])logρ  =  O ⁣(log(1/ε)logρ).N \;\ge\; \frac{\log\!\big(2 M / [\varepsilon(\rho - 1)]\big)}{\log \rho} \;=\; O\!\left(\tfrac{\log(1/\varepsilon)}{\log \rho}\right).

No ckc_k is ever actually computed in this derivation. We only need ρ\rho and MM for the function we plan to approximate. The same inversion on the CrC^r bound from Corollary 1 yields N=O(ε1/r)N = O(\varepsilon^{-1/r}). These are the “compression counts” stated above.

A view from Sobolev theory

The same correspondence has a formal home in functional analysis. The cleanest statement is for periodic ff on [0,2π][0, 2\pi] with Fourier coefficients f^k\hat f_k, where membership in the Sobolev space HsH^s is equivalent to

kk2sf^k2  <  .\sum_k |k|^{2s}\, |\hat f_k|^2 \;<\; \infty.

Roughly, ff has ss derivatives in L2L^2 iff f^k|\hat f_k| decays faster than ks1/2|k|^{-s - 1/2}. The non-periodic case on an interval like [1,1][-1, 1] obeys the same dictionary with the Fourier coefficients replaced by Chebyshev coefficients under x=cosθx = \cos\theta: the substitution turns Chebyshev expansions on [1,1][-1, 1] into Fourier cosine expansions on [0,π][0, \pi], and the two regularity-decay stories are the same. Either way: analytic-on-a-strip \leftrightarrow geometric decay, CrC^r \leftrightarrow algebraic decay, jump \leftrightarrow no decay.

Why this is universal

The uncertainty principle is not a quirk of Chebyshev series. It is the same duality behind every modern lossy compressor: JPEG quantises a block-wise DCT of the image and throws away the small high-frequency coefficients that smooth regions produce few of; MP3 does the same to audio after a psychoacoustic reweighting; MP4/H.264 transforms blocks of video frames the same way. In each case, “smoothness” of the signal (continuous skin tones, sustained tones, slowly-moving scenes) is what makes the DCT spectrum sparse, and sparse spectra are what compress.

A close relative is the SVD (and its statistical cousin PCA). Instead of fixing the basis a priori (a DCT, Chebyshev, or Fourier basis), SVD learns the optimal basis for a given matrix or data set. The singular values then play the role of the coefficients, and for smooth data (slowly-varying images, correlated measurements) they decay fast so a low-rank truncation captures most of the information. The principle is the same: smooth data has a short tail in some basis, and that tail is what compression throws away. SVD differs from JPEG/MP3 only in that the basis is data-adapted rather than fixed.

The same duality also underlies the Nyquist sampling theorem for band-limited signals and adaptive mesh refinement in FEM, where refinement is concentrated on the regions where the solution lacks smoothness. Approximation, compression, and sampling are three faces of the same theorem.

The rest of the chapter turns this compression picture into concrete error bounds (Theorem 3, Corollary 1) and an adaptive algorithm (Algorithm 1) that realises them without any a priori knowledge of ff’s regularity.

The Approximation Error

Theorem 3 (Uniform error bound for the Chebyshev projection)

If the Chebyshev series of ff converges uniformly on [1,1][-1, 1], then the Chebyshev projection Pnf=k=0nckTkP_n f = \sum_{k=0}^n c_k T_k from Definition 4 satisfies

fPnf    k=n+1ck.\|f - P_n f\|_\infty \;\le\; \sum_{k=n+1}^\infty |c_k|.

Proof 2

fPnf=k>nckTkf - P_n f = \sum_{k > n} c_k T_k and Tk(x)1|T_k(x)| \le 1 on [1,1][-1, 1]. Applying the triangle inequality term by term gives the bound.

Applying the decay theorems to the tail yields rates.

Corollary 1 (Convergence rates)

Let PnfP_n f denote the Chebyshev projection of ff.

  1. If ff is analytic on the Bernstein ellipse Eρ\mathcal{E}_\rho with fM|f| \le M there, then

    fPnf    2Mρ1ρn.\|f - P_n f\|_\infty \;\le\; \frac{2 M}{\rho - 1}\, \rho^{-n}.
  2. If f(r1)f^{(r-1)} is absolutely continuous and f(r)f^{(r)} has bounded variation VV with r1r \ge 1, then for n>rn > r

    fPnf    2Vπr(nr)r.\|f - P_n f\|_\infty \;\le\; \frac{2V}{\pi\, r\, (n - r)^r}.

Proof 3

(1) By Theorem 2, ck2Mρk|c_k| \le 2 M \rho^{-k}. Summing the geometric tail,

k>n2Mρk  =  2Mρnρ1.\sum_{k > n} 2 M \rho^{-k} \;=\; \frac{2 M\, \rho^{-n}}{\rho - 1}.

(2) By Theorem 1, ck2V/(π(kr)r+1)|c_k| \le 2V / (\pi (k-r)^{r+1}). The integral test gives

k>n1(kr)r+1    ndx(xr)r+1  =  1r(nr)r.\sum_{k > n} \frac{1}{(k-r)^{r+1}} \;\le\; \int_n^\infty \frac{dx}{(x-r)^{r+1}} \;=\; \frac{1}{r\, (n-r)^r}.

Multiplying by 2V/π2V/\pi yields the stated bound.

Convergence in action

Before bridging projection to interpolant, check whether the predicted rate actually shows up on a concrete example. Take f(x)=1/(1+25x2)f(x) = 1/(1 + 25 x^2), analytic with poles at ±i/5\pm i/5 and Bernstein parameter ρ1.22\rho \approx 1.22. The corollary predicts fPnf=O(ρn)\|f - P_n f\|_\infty = O(\rho^{-n}) for the projection. Below we plot ff and its Chebyshev interpolant pnp_n at two values of nn, shading fpn|f - p_n|, and ask whether the interpolant shows the same rate.

Source
<Figure size 1000x400 with 2 Axes>

Doubling nn from 8 to 16 shrinks the maximum error by roughly a factor of ρ85\rho^8 \approx 5, matching the projection-rate prediction. The interpolant tracks the projection rate very closely. We will explain why, through the Lebesgue constant, in the subsection after next.

The Gibbs Phenomenon

Before taking the interpolant story further, look at the other end of the regularity spectrum. Do we really need the bounded-variation hypothesis to get any decay at all? The sign function is the limiting case. It has bounded variation (one jump of height 2), so Theorem 1 applies with r=0r = 0 and gives

ck4πk.|c_k| \le \frac{4}{\pi k}.

But the tail sum that bounds the uniform error,

k=n+1ck    k=n+11k,\sum_{k=n+1}^\infty |c_k| \;\lesssim\; \sum_{k=n+1}^\infty \frac{1}{k},

diverges. The bound is useless, and that is not an artefact of our bounding: there really is no uniform convergence when ff has a jump. Near the discontinuity the truncated series exhibits a persistent overshoot whose magnitude does not shrink as nn \to \infty. Only its width shrinks.

Source
<Figure size 750x420 with 1 Axes>

The peak value of the truncated series can be derived in closed form. The idea is standard: write the partial sum Sn(x)S_n(x) of sign(x)\mathrm{sign}(x) via its Chebyshev (equivalently, Fourier) coefficients, differentiate to locate the first local maximum after the jump, and take nn \to \infty. The maximum sits at xπ/(2n)x \approx \pi/(2n), and substituting t=2nxt = 2n\, x turns the partial sum into a Riemann integral:

limnmaxxSn(x)  =  2π0πsinttdt  =  2πSi(π)    1.1789,\lim_{n \to \infty} \max_x S_n(x) \;=\; \frac{2}{\pi} \int_0^\pi \frac{\sin t}{t}\, dt \;=\; \frac{2}{\pi}\, \mathrm{Si}(\pi) \;\approx\; 1.1789,

where Si(x)=0xsinttdt\mathrm{Si}(x) = \int_0^x \frac{\sin t}{t}\, dt is the sine integral. The overshoot above the upper plateau (value 1) is 0.179\approx 0.179, or 8.95%\approx 8.95\% of the total jump height 2. This Gibbs–Wilbraham constant is universal: it depends only on the existence of a jump, not on the specific function or its location. The effect was first described by Wilbraham (1848) and famously rediscovered by Gibbs in Nature (1898), a half-century apart. It is the same constant and the same phenomenon as for Fourier series, which the two problems are under x=cosθx = \cos\theta.

The Sobolev picture sharpens the lesson. Membership in HsH^s is equivalent to kk2sck2<\sum_k k^{2s} |c_k|^2 < \infty. For a function with ck1/k|c_k| \sim 1/k this becomes kk2s2\sum_k k^{2s - 2}, which converges iff s<1/2s < 1/2. The jump function sits exactly on the Sobolev threshold s=1/2s = 1/2, which is also the embedding boundary HsC0H^s \hookrightarrow C^0. Below that threshold a function cannot be continuous, so the partial sums cannot converge uniformly to it, and the persistent Gibbs overshoot is the geometric record of that failure.

Bounded variation is enough for pointwise convergence by the Dirichlet–Jordan theorem. For any BV function ff on [1,1][-1, 1] the Chebyshev series converges at every point xx to

12(f(x+)+f(x)),\tfrac{1}{2}\bigl(f(x+) + f(x-)\bigr),

the average of the one-sided limits. At continuity points this is just f(x)f(x); at a jump it is the midpoint. For sign(x)\mathrm{sign}(x) the series thus converges to 0 at x=0x = 0 and to ±1\pm 1 elsewhere, everywhere pointwise. What fails is uniform convergence.

Remark 1 (Pointwise versus uniform)

Why is uniform stronger, and why doesn’t pointwise convergence rule out the Gibbs overshoot? The two notions differ in where the index NN is allowed to depend.

  • Pointwise: for every xx and every ε>0\varepsilon > 0, there is an N=N(x,ε)N = N(x, \varepsilon) such that Sn(x)f(x)<ε|S_n(x) - f(x)| < \varepsilon for all nNn \ge N. The NN you need can grow as xx moves.

  • Uniform: for every ε>0\varepsilon > 0, there is a single N=N(ε)N = N(\varepsilon) that works for every xx at once, so supxSn(x)f(x)0\sup_x |S_n(x) - f(x)| \to 0.

Gibbs is exactly what pointwise allows and uniform forbids. The overshoot has a fixed height 0.179\approx 0.179, but its location slides toward the jump as nn grows (the peak sits at distance π/(2n)\approx \pi / (2n)). For any fixed x00x_0 \neq 0, once nn is large enough the overshoot sits between x0x_0 and the jump rather than at x0x_0, and Sn(x0)S_n(x_0) approaches sign(x0)\mathrm{sign}(x_0) cleanly. So every fixed xx sees convergence: pointwise holds. But supxSn(x)sign(x)\sup_x |S_n(x) - \mathrm{sign}(x)| never falls below 0.179\approx 0.179, because there is always some xx near 0 where the overshoot has moved to: uniform fails.

The bad xx is moving, and pointwise convergence lets that happen. Uniform convergence is the promise that no such moving pocket of error can persist.

Take-away. BV gives coefficient decay at rate 1/k1/k and pointwise convergence, but not uniform convergence. If you must approximate discontinuous data, split the domain at the jumps and interpolate each smooth piece separately.

From projection to interpolant

The rates above control the projection PnfP_n f, the truncated Chebyshev series. What we actually compute is the Chebyshev interpolant pnp_n (the polynomial through ff at n+1n+1 nodes), and the two agree only up to aliasing. To transfer rates from PnfP_n f to pnp_n we need one more ingredient.

Definition 2 (Lebesgue function and Lebesgue constant)

For interpolation nodes x0,,xnx_0, \ldots, x_n with Lagrange basis j\ell_j, the Lebesgue function is

Λn(x)  =  j=0nj(x),\Lambda_n(x) \;=\; \sum_{j=0}^n |\ell_j(x)|,

and the Lebesgue constant is Λn=maxx[1,1]Λn(x)\Lambda_n = \max_{x \in [-1,1]} \Lambda_n(x).

At n=10n = 10 we can see each j(x)|\ell_j(x)| individually and watch them combine into Λn(x)\Lambda_n(x).

Source
<Figure size 1000x400 with 2 Axes>

Grey curves are the individual j(x)|\ell_j(x)|; the red envelope is their sum Λn(x)\Lambda_n(x). On equispaced nodes the basis functions near the endpoints grow large, and their sum peaks sharply there. On Chebyshev nodes the basis functions stay bounded by roughly 1, and their sum stays close to 1 across the interval.

Theorem 4 (Near-best approximation)

Let pnp_n interpolate ff at the nodes and pnp_n^* denote the best LL^\infty polynomial approximation of degree n\le n. Then

fpn    (1+Λn)fpn.\|f - p_n\|_\infty \;\le\; (1 + \Lambda_n)\,\|f - p_n^*\|_\infty.

The Lebesgue constant is the amplification factor between the best polynomial approximation of ff and the interpolant we can actually compute. Since PnfP_n f is some polynomial of degree n\le n, fpnfPnf\|f - p_n^*\|_\infty \le \|f - P_n f\|_\infty. Chaining with the theorem,

fpn    (1+Λn)fpn    (1+Λn)fPnf.\|f - p_n\|_\infty \;\le\; (1 + \Lambda_n)\, \|f - p_n^*\|_\infty \;\le\; (1 + \Lambda_n)\, \|f - P_n f\|_\infty.

The interpolation error splits into two factors:

To see how the Lebesgue constant scales with nn, track the maximum of Λn(x)\Lambda_n(x) over a range of nn:

Source
<Figure size 750x420 with 1 Axes>

The asymptotic rates (Trefethen (2013)) are

Λnequi2n+1enlogn,ΛnCheb=2πlog(n+1)+O(1).\Lambda_n^{\text{equi}} \sim \frac{2^{n+1}}{e\,n \log n}, \qquad \Lambda_n^{\text{Cheb}} = \frac{2}{\pi} \log(n+1) + O(1).

No choice of n+1n+1 interpolation nodes on [1,1][-1, 1] can do better than 2πlogn\frac{2}{\pi}\log n; the Chebyshev rate is optimal up to a constant. For equispaced nodes the interpolation penalty alone grows faster than any polynomial decay can compensate, which recovers the Runge picture: no matter how smooth ff is, fpn\|f - p_n\|_\infty fails to converge.

Choosing NN Adaptively

The decay theorems guarantee that ck0|c_k| \to 0 at a rate controlled by the regularity of ff. In practice we do not want to set NN by hand. We want an algorithm that samples ff, reads off the coefficient tail, and returns the smallest NN for which truncation is already below tolerance. The following is the algorithm Chebfun uses, simplified.

Algorithm 1 (Adaptive Chebyshev truncation)

Input. Function ff on [1,1][-1,1]; tolerance tol\mathrm{tol}; cap NmaxN_{\max}.

Output. Truncated coefficient vector (c0,,cM)(c_0, \ldots, c_M) with fpMtol\|f - p_M\|_\infty \lesssim \mathrm{tol}.

  1. Set N16N \leftarrow 16.

  2. Sample ff at the N+1N+1 Chebyshev nodes; compute c0,,cNc_0, \ldots, c_N by DCT.

  3. Inspect the last 10%10\% of the coefficients, ckc_k for k0.9Nk \ge \lfloor 0.9\, N \rfloor.

  4. If maxck\max |c_k| in that block is below tol\mathrm{tol}: the tail has resolved. Find the smallest MM such that cM,,cN|c_M|, \ldots, |c_N| all lie below tol\mathrm{tol}; return (c0,,cM)(c_0, \ldots, c_M).

  5. Else double NN and return to step 2. If N>NmaxN > N_{\max}, report failure to resolve.

The termination test in step 4 is the operational form of the decay theorems. If ff is analytic or CrC^r with r1r \ge 1, the ck|c_k| eventually drop below any positive tolerance, so the loop terminates. If ff has a jump, ck|c_k| decays like 1/k1/k at best, and the tail never clears the tolerance. Non-termination is a correct diagnostic that the regularity hypothesis of Theorem 1 has failed.

Worked example: tanh(20x)\tanh(20 x)

The function tanh(20x)\tanh(20x) is entire and approaches ±1\pm 1 over a short transition near the origin. It is smooth enough for adaptivity to succeed, steep enough that a small NN will not do.

Source
<Figure size 750x420 with 1 Axes>

The loop doubles NN until the top 10%10\% of the coefficient spectrum sits below 10-12, then reports the first index MM at which the tail has flattened. The caller receives (c0,,cM)(c_0, \ldots, c_M) and a near-best polynomial approximant of automatically chosen degree.

References
  1. Trefethen, L. N. (2013). Approximation Theory and Approximation Practice. SIAM.