Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Big Idea

Polynomial interpolation is a change of basis between two equivalent representations of the same polynomial: its values {fj}\{f_j\} at n+1n+1 nodes and its coefficients {cj}\{c_j\} in some basis of Pn\mathbb{P}_n. The Vandermonde matrix is that change-of-basis map. In the monomial basis this map is catastrophically ill-conditioned. In the Lagrange basis it is the identity, but naive evaluation is unstable. The naive Lagrange evaluation and the Vandermonde linear solve are numerically unstable for moderate nn and are not the algorithms used in practice. The barycentric formula is.

The Interpolation Problem

Given n+1n+1 distinct nodes x0<x1<<xnx_0 < x_1 < \cdots < x_n and values f0,,fnf_0, \ldots, f_n, find a polynomial pnPnp_n \in \mathbb{P}_n such that

pn(xj)=fj,j=0,1,,n.p_n(x_j) = f_j, \qquad j = 0, 1, \ldots, n.

Theorem 1 (Existence and Uniqueness)

For any n+1n+1 distinct nodes there exists a unique pnPnp_n \in \mathbb{P}_n interpolating the data.

Proof 1

Uniqueness. If p,qPnp, q \in \mathbb{P}_n both interpolate the data then d=pqPnd = p - q \in \mathbb{P}_n has n+1n+1 roots, so d0d \equiv 0.

Existence. The Lagrange formula below constructs one.

Three Bases for Pn\mathbb{P}_n

Polynomials of degree n\le n form an (n+1)(n+1)-dimensional vector space. Picking a basis {ϕ0,,ϕn}\{\phi_0, \ldots, \phi_n\} writes every polynomial as pn(x)=jcjϕj(x)p_n(x) = \sum_j c_j\, \phi_j(x). The interpolation conditions pn(xi)=fip_n(x_i) = f_i then become a linear system Φc=f\Phi\, \mathbf{c} = \mathbf{f} where Φij=ϕj(xi)\Phi_{ij} = \phi_j(x_i). The cost of interpolation, its conditioning, and how easily we can extend it later all depend on which basis we pick. We discuss three.

Monomial basis

The basis you already know, {1,x,x2,,xn}\{1, x, x^2, \ldots, x^n\}. The motivation is familiarity: every polynomial is written in this basis when you first meet it. The coefficients cjc_j are then the usual coefficients of xjx^j, and the values↔coefficients system Φc=f\Phi\, \mathbf{c} = \mathbf{f} takes the explicit form

(1x0x02x0n1x1x12x1n1xnxn2xnn)V(c0c1cn)=(f0f1fn).\underbrace{\begin{pmatrix} 1 & x_0 & x_0^2 & \cdots & x_0^n \\ 1 & x_1 & x_1^2 & \cdots & x_1^n \\ \vdots & & & & \vdots \\ 1 & x_n & x_n^2 & \cdots & x_n^n \end{pmatrix}}_{V} \begin{pmatrix} c_0 \\ c_1 \\ \vdots \\ c_n \end{pmatrix} = \begin{pmatrix} f_0 \\ f_1 \\ \vdots \\ f_n \end{pmatrix}.

VV is the Vandermonde matrix. Each row records the powers of one node; each column records one monomial sampled at all nodes. The system is solvable whenever the nodes are distinct, but solving it is numerically a disaster: the condition number κ(V)\kappa(V) grows exponentially in nn.

Source
<Figure size 700x420 with 1 Axes>

For equispaced nodes κ(V)\kappa(V) blows past 1/εmach1/\varepsilon_{\text{mach}} already by n25n \approx 25. Solving Vc=fV\mathbf{c} = \mathbf{f} via a general linear solver in double precision then loses essentially all significant digits. Chebyshev nodes (anticipated in Where to Place the Nodes) help, but even then κ(V)\kappa(V) grows.

This blow-up reflects the choice of algorithm applied to the values↔coefficients map, not the conditioning of polynomial interpolation itself. Specialized solvers that exploit the Vandermonde structure, such as the Björck–Pereyra algorithm Björck & Pereyra, 1970, recover the coefficients to far higher accuracy than κ(V)\kappa(V) alone would suggest. A more robust strategy is to choose a basis in which the values↔coefficients matrix Φ\Phi has trivial structure: triangular, identity, or orthogonal. The conditioning then ceases to be a problem, because κ(Φ)=1\kappa(\Phi) = 1 for the identity and for any orthogonal matrix, and triangular systems are solved by direct substitution. The Newton basis (below) makes Φ\Phi lower triangular. The Lagrange basis after that makes it the identity. The Chebyshev basis at Chebyshev nodes, introduced in Where to Place the Nodes, makes Φ\Phi (a scaling of) the Discrete Cosine Transform matrix, which is orthogonal up to that scaling. From the linear algebra you have already developed in QR Factorization we know that orthogonal factors do not amplify error. That is exactly why these alternative bases sidestep the Vandermonde blow-up: they replace the ill-conditioned dense solve by a multiplication against a perfectly-conditioned matrix, which the FFT moreover computes in O(nlogn)O(n \log n) operations.

Newton basis

The motivation for Newton is incrementality: we want a basis where the values↔coefficients matrix Φ\Phi is lower triangular, so that adding a new data point appends one equation and one unknown without disturbing any earlier coefficient. That forces ϕk\phi_k to vanish at x0,,xk1x_0, \ldots, x_{k-1}, giving the basis

{1,  (xx0),  (xx0)(xx1),  ,  (xx0)(xxn1)}.\{1,\; (x-x_0),\; (x-x_0)(x-x_1),\; \ldots,\; (x-x_0)\cdots(x-x_{n-1})\}.

The coefficients in this basis are the divided differences of ff, and the interpolant of degree n+1n+1 is obtained from the interpolant of degree nn by adding one new term. Newton is therefore the basis of choice when the number of nodes is not known in advance, or when nodes are added adaptively. We will not need this construction explicitly in what follows. Its modern replacement is the adaptive evaluation discussed in Adaptive QR: Choosing N on the Fly, which gets the same incremental property in a different way.

Lagrange basis

The motivation here is hands-on: build the interpolant as an explicit product of linear factors that automatically passes through the data. For each node xjx_j, the polynomial

j(x)=ijxxixjxi\ell_j(x) = \prod_{i \ne j} \frac{x - x_i}{x_j - x_i}

is, by construction, zero at every other node xix_i (one of the factors in the product vanishes there) and equal to 1 at xjx_j (the numerator matches the denominator). So j(xi)=δij\ell_j(x_i) = \delta_{ij}, and the linear combination

pn(x)=j=0nfjj(x)p_n(x) = \sum_{j=0}^n f_j\, \ell_j(x)

automatically satisfies pn(xi)=fip_n(x_i) = f_i at every node. No system to solve, no inversion: just a product of monomials weighted by the data. As a bonus, the values↔coefficients matrix Φ\Phi is now the identity, so the coefficients of pnp_n in this basis are literally the data values.

The cost is paid in evaluation: the formula above is O(n2)O(n^2) per evaluation point and is prone to overflow as nn grows. The barycentric reformulation below fixes both issues without leaving the basis.

Source
<Figure size 700x400 with 1 Axes>

Each j\ell_j is the unique degree-nn polynomial that is 1 at xjx_j and 0 at every other node. The interpolant fjj\sum f_j \ell_j inherits the required values automatically.

The Barycentric Formula

We can reorganize the Lagrange formula algebraically into a form that evaluates in O(n)O(n) and is numerically stable. Define the node polynomial and barycentric weights

(x)=k=0n(xxk),λj=1kj(xjxk).\ell(x) = \prod_{k=0}^n (x - x_k), \qquad \lambda_j = \frac{1}{\prod_{k \ne j}(x_j - x_k)}.

Then j(x)=(x)λj/(xxj)\ell_j(x) = \ell(x)\, \lambda_j / (x - x_j), which expresses each basis function through the common polynomial (x)\ell(x) and its weight λj\lambda_j. Combining this factorisation with the following basic property of the Lagrange basis gives the second barycentric formula.

Lemma 1 (Partition of unity)

For any distinct nodes x0,,xnx_0, \ldots, x_n, the Lagrange basis satisfies

j=0nj(x)    1for all xR.\sum_{j=0}^n \ell_j(x) \;\equiv\; 1 \qquad \text{for all } x \in \mathbb{R}.

Proof 2

Consider the data fj=1f_j = 1 at each node xjx_j. There are two polynomials in Pn\mathbb{P}_n that match this data:

  • The constant polynomial p(x)1p(x) \equiv 1, which obviously satisfies p(xj)=1p(x_j) = 1 at every xjx_j and has degree 0n0 \le n.

  • The Lagrange interpolant q(x)=j=0n1j(x)=j=0nj(x)q(x) = \sum_{j=0}^n 1 \cdot \ell_j(x) = \sum_{j=0}^n \ell_j(x), which by construction satisfies q(xj)=j(xj)=1q(x_j) = \ell_j(x_j) = 1 and is a sum of polynomials of degree n\le n.

By the uniqueness half of Theorem 1, there is only one polynomial in Pn\mathbb{P}_n matching the data, so pqp \equiv q as polynomials. Equality of polynomials means equality at every xx, not just at the nodes:

1  =  j=0nj(x)for all xR.1 \;=\; \sum_{j=0}^n \ell_j(x) \qquad \text{for all } x \in \mathbb{R}.

Proposition 1 (Barycentric Interpolation Formula)

pn(x)=j=0nλjxxjfjj=0nλjxxj,pn(xj)=fj.p_n(x) = \frac{\displaystyle\sum_{j=0}^n \frac{\lambda_j}{x - x_j}\, f_j} {\displaystyle\sum_{j=0}^n \frac{\lambda_j}{x - x_j}}, \qquad p_n(x_j) = f_j.

Proof 3

Start from the Lagrange formula and substitute j(x)=(x)λj/(xxj)\ell_j(x) = \ell(x)\, \lambda_j / (x - x_j):

pn(x)  =  j=0nfjj(x)  =  (x)j=0nλjxxjfj.p_n(x) \;=\; \sum_{j=0}^n f_j\, \ell_j(x) \;=\; \ell(x) \sum_{j=0}^n \frac{\lambda_j}{x - x_j}\, f_j.

The partition of unity identity jj(x)=1\sum_j \ell_j(x) = 1, with the same substitution applied, gives

1  =  j=0nj(x)  =  (x)j=0nλjxxj,so(x)  =  1j=0nλjxxj.1 \;=\; \sum_{j=0}^n \ell_j(x) \;=\; \ell(x) \sum_{j=0}^n \frac{\lambda_j}{x - x_j}, \qquad \text{so} \qquad \ell(x) \;=\; \frac{1}{\displaystyle \sum_{j=0}^n \frac{\lambda_j}{x - x_j}}.

Substituting this expression for (x)\ell(x) into the formula for pnp_n eliminates the prefactor and produces the ratio in the proposition. At a node x=xjx = x_j both sums share a 1/(xxj)1/(x - x_j) singularity that cancels in the ratio, leaving pn(xj)=fjp_n(x_j) = f_j.

Once the weights λj\lambda_j are computed once in O(n2)O(n^2), each evaluation costs O(n)O(n).

First vs. Second Barycentric Formula

The same derivation produces a first barycentric formula that we mostly skip but is worth naming:

pn(x)=(x)j=0nλjxxjfj,(x)=k=0n(xxk).p_n(x) = \ell(x) \sum_{j=0}^n \frac{\lambda_j}{x - x_j}\, f_j, \qquad \ell(x) = \prod_{k=0}^n (x - x_k).

Both formulas evaluate the same polynomial; they differ only in numerical behaviour, and the difference matters at the boundary of where they apply.

In this chapter we work exclusively with Chebyshev nodes and evaluate on [1,1][-1,1], so the second formula is the right default.

A Concrete Example

Here is the Lagrange interpolant of f(x)=esin5xf(x) = e^{\sin 5x} through n+1n+1 nodes on [1,1][-1,1], evaluated via the barycentric formula from Proposition 1. The left panel uses equispaced nodes; the right panel uses the Chebyshev nodes xj=cos(jπ/n)x_j = \cos(j\pi/n), properly motivated in Where to Place the Nodes. For this smooth ff both work fine; the visual difference becomes important once ff is harder.

Source
<Figure size 1100x400 with 2 Axes>

The interpolant passes through every data point by construction, and as nn grows it tracks ff more and more closely across the interval. Whether this convergence continues as nn \to \infty, and how fast, depends on both ff and the choice of nodes. We take that up in the next two sections.

References
  1. Björck, \AAke, & Pereyra, V. (1970). Solution of Vandermonde systems of equations. Mathematics of Computation, 24(112), 893–903.