Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

The Calculus of Distributions

Big Idea

Every operation on distributions is defined by duality: to apply an operation to a distribution TT, transfer the adjoint operation to the test function. Differentiation becomes the integration-by-parts formula, and the payoff is immediate: every distribution is infinitely differentiable.

The key idea throughout this section is the same: if an operation AA on smooth functions satisfies Af,φ=f,Aφ\langle Af, \varphi \rangle = \langle f, A^\dagger \varphi \rangle for some adjoint AA^\dagger that maps test functions to test functions, then we define AT,φ=T,Aφ\langle AT, \varphi \rangle = \langle T, A^\dagger \varphi \rangle for any distribution TT.

A note on notation. We have a functional T:D(Ω)RT : \mathcal{D}(\Omega) \to \mathbb{R}, and we want to define a new functional ATAT. The definition always acts on the test function: ATAT is the composition TAT \circ A^\dagger, and TT itself is unchanged. Concretely:

In each case the notation (DαTD^\alpha T, fTfT, τhT\tau_h T) suggests we are acting on TT, but the operation is transferred to φ\varphi via the adjoint. This is the adjoint construction from the duality chapter (Definition 1) applied concretely: if A:D(Ω)D(Ω)A : \mathcal{D}(\Omega) \to \mathcal{D}(\Omega) is a continuous linear map, its adjoint A:D(Ω)D(Ω)A^* : \mathcal{D}'(\Omega) \to \mathcal{D}'(\Omega) acts by AT=TAA^* T = T \circ A. Differentiation, multiplication, and translation on distributions are all instances of this. The notation is justified by consistency: when T=TfT = T_f for a classical function ff, the distribution ATAT agrees with the distribution TAfT_{Af} defined by applying AA to ff directly.

Multiplication by Smooth Functions

Given fC(Ω)f \in C^\infty(\Omega) and TD(Ω)T \in \mathcal{D}'(\Omega), we want to define a new distribution that deserves the name “fTfT.”

Definition 1 (Multiplication by a smooth function)

Let fC(Ω)f \in C^\infty(\Omega) and TD(Ω)T \in \mathcal{D}'(\Omega). Define U:D(Ω)RU : \mathcal{D}(\Omega) \to \mathbb{R} by

U(φ)=T(fφ).U(\varphi) = T(f\varphi).

We write fT:=UfT := U and call it the product of ff and TT.

Note that UU is well-defined as a map: fφD(Ω)f\varphi \in \mathcal{D}(\Omega) whenever φD(Ω)\varphi \in \mathcal{D}(\Omega), because the product of a smooth function and a compactly supported smooth function is again compactly supported and smooth. So T(fφ)T(f\varphi) makes sense.

Proposition 1 (Multiplication produces a distribution)

fTD(Ω)fT \in \mathcal{D}'(\Omega).

Proof 1

Linearity.

U(aφ+bψ)=T(f(aφ+bψ))=T(afφ+bfψ)=aT(fφ)+bT(fψ)=aU(φ)+bU(ψ).U(a\varphi + b\psi) = T(f(a\varphi + b\psi)) = T(af\varphi + bf\psi) = a\,T(f\varphi) + b\,T(f\psi) = a\,U(\varphi) + b\,U(\psi).

Continuity. Suppose φnφ\varphi_n \to \varphi in D(Ω)\mathcal{D}(\Omega) (supports in a fixed compact KK, all derivatives converge uniformly). Then fφnfφf\varphi_n \to f\varphi in D(Ω)\mathcal{D}(\Omega): the supports remain in KK, and the Leibniz rule gives

Dα(fφn)=βα(αβ)DβfDαβφnβα(αβ)DβfDαβφ=Dα(fφ)D^\alpha(f\varphi_n) = \sum_{\beta \leq \alpha} \binom{\alpha}{\beta} D^\beta f \cdot D^{\alpha-\beta}\varphi_n \to \sum_{\beta \leq \alpha} \binom{\alpha}{\beta} D^\beta f \cdot D^{\alpha-\beta}\varphi = D^\alpha(f\varphi)

uniformly on KK, since ff and all its derivatives are bounded on KK. Since TT is continuous,

U(φn)=T(fφn)T(fφ)=U(φ).U(\varphi_n) = T(f\varphi_n) \to T(f\varphi) = U(\varphi).

Consistency with functions. When T=TgT = T_g for gLloc1(Ω)g \in L^1_{\mathrm{loc}}(\Omega), the distribution fTgfT_g agrees with TfgT_{fg}:

(fTg)(φ)=Tg(fφ)=Ωg(x)f(x)φ(x)dx=Tfg(φ).(fT_g)(\varphi) = T_g(f\varphi) = \int_\Omega g(x)\,f(x)\,\varphi(x)\,dx = T_{fg}(\varphi).

So the distributional product extends pointwise multiplication of functions.

Example 1 (Multiplying the Dirac delta)

For fC(Ω)f \in C^\infty(\Omega): (fδ)(φ)=δ(fφ)=f(0)φ(0)=f(0)δ(φ)(f\delta)(\varphi) = \delta(f\varphi) = f(0)\varphi(0) = f(0)\,\delta(\varphi). So fδ=f(0)δf\delta = f(0)\delta: the delta “evaluates” the smooth factor at its support.

More generally, fδx0=f(x0)δx0f\delta_{x_0} = f(x_0)\delta_{x_0}.

We cannot multiply two arbitrary distributions. The construction above requires ff to be smooth (or at least sufficiently regular) so that φfφ\varphi \mapsto f\varphi maps D(Ω)\mathcal{D}(\Omega) to itself. The difficulty of multiplying distributions is a fundamental limitation: it is related to the problem of renormalization in quantum field theory and was one of the motivations for Colombeau’s theory of generalized functions.

Differentiation of Distributions

With the duality principle now familiar from translation and multiplication, we come to the main payoff: differentiation.

The definition

The definition is motivated by integration by parts. Suppose ff is smooth. Then for every φD(Ω)\varphi \in \mathcal{D}(\Omega),

Ωf(x)φ(x)dx=Ωf(x)φ(x)dx\int_\Omega f'(x)\,\varphi(x)\,dx = -\int_\Omega f(x)\,\varphi'(x)\,dx

since the boundary terms vanish (test functions have compact support in Ω\Omega). In the language of distributions: Tf(φ)=Tf(φ)T_{f'}(\varphi) = -T_f(\varphi'). The right-hand side makes sense for any distribution TT, not just those coming from smooth functions: given TD(Ω)T \in \mathcal{D}'(\Omega), the map φT(φ)\varphi \mapsto -T(\varphi') is perfectly well-defined. This suggests a definition.

Definition 2 (Distributional derivative)

Let TD(Ω)T \in \mathcal{D}'(\Omega) and α\alpha a multi-index. Define U:D(Ω)RU : \mathcal{D}(\Omega) \to \mathbb{R} by

U(φ)=(1)αT(Dαφ).U(\varphi) = (-1)^{|\alpha|}\, T(D^\alpha \varphi).

We write DαT:=UD^\alpha T := U and call it the distributional derivative of TT.

Theorem 1 (Every distribution is infinitely differentiable)

For any TD(Ω)T \in \mathcal{D}'(\Omega) and any multi-index α\alpha, DαTD(Ω)D^\alpha T \in \mathcal{D}'(\Omega). In particular, every distribution can be differentiated infinitely many times.

Proof 2

Set U=DαTU = D^\alpha T, i.e. U(φ)=(1)αT(Dαφ)U(\varphi) = (-1)^{|\alpha|}\,T(D^\alpha\varphi).

Linearity.

U(aφ+bψ)=(1)αT(Dα(aφ+bψ))=(1)αT(aDαφ+bDαψ)=aU(φ)+bU(ψ).U(a\varphi + b\psi) = (-1)^{|\alpha|}\,T(D^\alpha(a\varphi + b\psi)) = (-1)^{|\alpha|}\,T(a\,D^\alpha\varphi + b\,D^\alpha\psi) = a\,U(\varphi) + b\,U(\psi).

Continuity. If φnφ\varphi_n \to \varphi in D(Ω)\mathcal{D}(\Omega) (supports in a fixed compact KK, all derivatives converge uniformly), then DαφnDαφD^\alpha \varphi_n \to D^\alpha \varphi in D(Ω)\mathcal{D}(\Omega) (same compact KK, all derivatives still converge uniformly — differentiating does not enlarge supports). Since TT is continuous,

U(φn)=(1)αT(Dαφn)(1)αT(Dαφ)=U(φ).U(\varphi_n) = (-1)^{|\alpha|}\,T(D^\alpha \varphi_n) \to (-1)^{|\alpha|}\,T(D^\alpha \varphi) = U(\varphi).

Since UU is linear and continuous, UD(Ω)U \in \mathcal{D}'(\Omega). As TT and α\alpha were arbitrary, we can differentiate again: Dβ(DαT)D^\beta(D^\alpha T) is again a distribution, and so on indefinitely.

Consistency with functions. When T=TfT = T_f for fCα(Ω)f \in C^{|\alpha|}(\Omega), integration by parts gives

(DαTf)(φ)=(1)αTf(Dαφ)=(1)αΩfDαφdx=ΩDαfφdx=TDαf(φ).(D^\alpha T_f)(\varphi) = (-1)^{|\alpha|}\,T_f(D^\alpha\varphi) = (-1)^{|\alpha|}\int_\Omega f\,D^\alpha\varphi\,dx = \int_\Omega D^\alpha f\,\varphi\,dx = T_{D^\alpha f}(\varphi).

So the distributional derivative extends the classical one.

This is the main payoff of the distributional framework: differentiation is always possible, and it always produces another distribution.

The differentiation cascade

The power of distributional derivatives is best seen through a chain of examples in one dimension (Ω=R\Omega = \mathbb{R}).

Example 2 (Heaviside to Dirac)

The Heaviside function H(x)={1x>00x<0H(x) = \begin{cases} 1 & x > 0 \\ 0 & x < 0 \end{cases} is locally integrable and defines a regular distribution. Its distributional derivative is the Dirac delta:

H,φ=H,φ=0φ(x)dx=φ(0)=δ,φ.\langle H', \varphi \rangle = -\langle H, \varphi' \rangle = -\int_0^\infty \varphi'(x)\,dx = \varphi(0) = \langle \delta, \varphi \rangle.

So H=δH' = \delta in D(R)\mathcal{D}'(\mathbb{R}).

Example 3 (The x|x| cascade)

The function f(x)=xf(x) = |x| is continuous but not differentiable at the origin. Its distributional derivative is

f,φ=xφ(x)dx=0(x)φ(x)dx0xφ(x)dx.\langle f', \varphi \rangle = -\int_{-\infty}^\infty |x|\,\varphi'(x)\,dx = -\int_{-\infty}^0 (-x)\varphi'(x)\,dx - \int_0^\infty x\,\varphi'(x)\,dx.

Integrating by parts on each half-line (boundary terms at ±\pm\infty vanish because φ\varphi has compact support; at 0 the contributions from both sides cancel):

=0φ(x)dx+0φ(x)dx=sgn(x)φ(x)dx.= -\int_{-\infty}^0 \varphi(x)\,dx + \int_0^\infty \varphi(x)\,dx = \int_{-\infty}^\infty \operatorname{sgn}(x)\,\varphi(x)\,dx.

So x=sgn(x)|x|' = \operatorname{sgn}(x) as distributions. Differentiating once more:

sgn,φ=sgn(x)φ(x)dx=0φ(x)dx0φ(x)dx=2φ(0)=2δ,φ.\langle \operatorname{sgn}', \varphi \rangle = -\int_{-\infty}^\infty \operatorname{sgn}(x)\,\varphi'(x)\,dx = \int_{-\infty}^0 \varphi'(x)\,dx - \int_0^\infty \varphi'(x)\,dx = 2\varphi(0) = \langle 2\delta, \varphi \rangle.

The full cascade is:

xDsgn(x)D2δ0D2δ0D|x| \xrightarrow{D} \operatorname{sgn}(x) \xrightarrow{D} 2\delta_0 \xrightarrow{D} 2\delta_0' \xrightarrow{D} \cdots

Each step is well-defined as a distributional derivative. The first step produces a discontinuous function, the second a measure, and the third an object that is not even a measure, but all are distributions.

Example 4 (Differentiating logx\log|x|)

The function logx\log|x| is locally integrable on R\mathbb{R} (the singularity at the origin is integrable). Its distributional derivative is the principal value distribution:

(logx)=p.v.1x(\log|x|)' = \mathrm{p.v.}\,\frac{1}{x}

in D(R)\mathcal{D}'(\mathbb{R}). This can be verified by direct computation:

(logx),φ=logxφ(x)dx=limε0x>εφ(x)xdx\langle (\log|x|)', \varphi \rangle = -\int_{-\infty}^\infty \log|x|\,\varphi'(x)\,dx = \lim_{\varepsilon \to 0} \int_{|x|>\varepsilon} \frac{\varphi(x)}{x}\,dx

where the last equality follows from integration by parts on (,ε)(-\infty, -\varepsilon) and (ε,)(\varepsilon, \infty) and observing that the boundary terms logε[φ(ε)φ(ε)]0\log\varepsilon\,[\varphi(\varepsilon) - \varphi(-\varepsilon)] \to 0 since φ\varphi is smooth.

Properties of distributional differentiation

Proposition 2 (Basic properties)

Let S,TD(Ω)S, T \in \mathcal{D}'(\Omega), α,β\alpha, \beta multi-indices, cRc \in \mathbb{R}.

  1. Linearity: Dα(cS+T)=cDαS+DαTD^\alpha(cS + T) = cD^\alpha S + D^\alpha T.

  2. Commutativity of mixed partials: DαDβT=DβDαT=Dα+βTD^\alpha D^\beta T = D^\beta D^\alpha T = D^{\alpha + \beta} T.

  3. Consistency: If fCα(Ω)f \in C^{|\alpha|}(\Omega), the distributional derivative DαTfD^\alpha T_f agrees with the classical derivative TDαfT_{D^\alpha f}.

  4. Continuity: If TnTT_n \to T in D(Ω)\mathcal{D}'(\Omega), then DαTnDαTD^\alpha T_n \to D^\alpha T in D(Ω)\mathcal{D}'(\Omega).

Proof 3

All four follow directly from the definition DαT,φ=(1)αT,Dαφ\langle D^\alpha T, \varphi \rangle = (-1)^{|\alpha|} \langle T, D^\alpha \varphi \rangle.

  1. Linearity of TT in the pairing.

  2. DαDβT,φ=(1)αDβT,Dαφ=(1)α+βT,Dα+βφ\langle D^\alpha D^\beta T, \varphi \rangle = (-1)^{|\alpha|} \langle D^\beta T, D^\alpha \varphi \rangle = (-1)^{|\alpha|+|\beta|} \langle T, D^{\alpha+\beta}\varphi \rangle, which is symmetric in α,β\alpha, \beta since partial derivatives of smooth functions commute.

  3. For fCαf \in C^{|\alpha|}, integration by parts gives (1)αfDαφ=(Dαf)φ(-1)^{|\alpha|} \int f\,D^\alpha \varphi = \int (D^\alpha f)\,\varphi.

  4. If Tn,ψT,ψ\langle T_n, \psi \rangle \to \langle T, \psi \rangle for all ψD\psi \in \mathcal{D}, then in particular for ψ=Dαφ\psi = D^\alpha \varphi: DαTn,φ=(1)αTn,Dαφ(1)αT,Dαφ=DαT,φ\langle D^\alpha T_n, \varphi \rangle = (-1)^{|\alpha|} \langle T_n, D^\alpha \varphi \rangle \to (-1)^{|\alpha|} \langle T, D^\alpha \varphi \rangle = \langle D^\alpha T, \varphi \rangle.

Property 4 is remarkable: you can always interchange limits and derivatives in the sense of distributions. In classical analysis, this requires uniform convergence of derivatives. In the distributional framework, pointwise convergence of the functionals is enough, because the derivative is transferred to the (fixed, smooth) test function.

Proposition 3 (Leibniz rule for distributions)

For fC(Ω)f \in C^\infty(\Omega) and TD(Ω)T \in \mathcal{D}'(\Omega):

D(fT)=fT+fTD(fT) = f'T + fT'

and more generally, for any multi-index α\alpha:

Dα(fT)=βα(αβ)DβfDαβT.D^\alpha(fT) = \sum_{\beta \leq \alpha} \binom{\alpha}{\beta} D^\beta f \cdot D^{\alpha - \beta} T.

Proof 4

For the first-order case in one dimension:

D(fT),φ=fT,φ=T,fφ.\langle D(fT), \varphi \rangle = -\langle fT, \varphi' \rangle = -\langle T, f\varphi' \rangle.

Now fφ=(fφ)fφf\varphi' = (f\varphi)' - f'\varphi, so

=T,(fφ)+T,fφ=T,fφ+fT,φ=fT,φ+fT,φ.= -\langle T, (f\varphi)' \rangle + \langle T, f'\varphi \rangle = \langle T', f\varphi \rangle + \langle f'T, \varphi \rangle = \langle fT', \varphi \rangle + \langle f'T, \varphi \rangle.

The general case follows by induction on α|\alpha|.

Taylor’s theorem for distributions

Now that we have both translation and differentiation, we can combine them into a distributional Taylor expansion. The idea is simple: to expand τhT\tau_h T in powers of hh, apply τhT\tau_h T to a test function and use the classical Taylor expansion of φ(x+h)\varphi(x + h) in hh.

Theorem 2 (Taylor’s theorem for distributions)

Let TD(Rd)T \in \mathcal{D}'(\mathbb{R}^d) and hRdh \in \mathbb{R}^d. Then for every N0N \geq 0,

τhT=αN(h)αα!DαT+RN\tau_h T = \sum_{|\alpha| \leq N} \frac{(-h)^\alpha}{\alpha!}\, D^\alpha T + R_N

where the remainder RND(Rd)R_N \in \mathcal{D}'(\mathbb{R}^d) satisfies

RN(φ)=T ⁣(α=N+1hαα!01(1t)N(Dαφ)(+th)(N+1)dt)R_N(\varphi) = T\!\left( \sum_{|\alpha| = N+1} \frac{h^\alpha}{\alpha!} \int_0^1 (1-t)^N\, (D^\alpha\varphi)(\cdot + th)\, (N+1)\,dt \right)

for all φD(Rd)\varphi \in \mathcal{D}(\mathbb{R}^d).

Proof 5

By definition, (τhT)(φ)=T(τhφ)=T(φ(+h))(\tau_h T)(\varphi) = T(\tau_{-h}\varphi) = T(\varphi(\cdot + h)). Since φ\varphi is smooth, we apply the classical Taylor expansion in the variable hh: for each fixed xx,

φ(x+h)=αNhαα!Dαφ(x)+α=N+1hαα!01(1t)N(Dαφ)(x+th)(N+1)dt.\varphi(x + h) = \sum_{|\alpha| \leq N} \frac{h^\alpha}{\alpha!}\, D^\alpha\varphi(x) + \sum_{|\alpha| = N+1} \frac{h^\alpha}{\alpha!} \int_0^1 (1-t)^N\, (D^\alpha\varphi)(x + th)\,(N+1)\,dt.

This expansion, and all its xx-derivatives, converge uniformly on compact sets (the remainder and its derivatives are controlled by finitely many derivatives of φ\varphi on a compact set containing supp(φ)+[0,1]h\operatorname{supp}(\varphi) + [0,1] \cdot h). So the expansion holds in D(Rd)\mathcal{D}(\mathbb{R}^d). Applying TT:

(τhT)(φ)=αNhαα!T(Dαφ)+T(remainder).(\tau_h T)(\varphi) = \sum_{|\alpha| \leq N} \frac{h^\alpha}{\alpha!}\, T(D^\alpha\varphi) + T(\text{remainder}).

Now T(Dαφ)=(1)α(DαT)(φ)T(D^\alpha\varphi) = (-1)^{|\alpha|}(D^\alpha T)(\varphi), so

(τhT)(φ)=αN(h)αα!(DαT)(φ)+RN(φ).(\tau_h T)(\varphi) = \sum_{|\alpha| \leq N} \frac{(-h)^\alpha}{\alpha!}\, (D^\alpha T)(\varphi) + R_N(\varphi).

Example 5 (Taylor expansion of δh\delta_h)

Taking T=δT = \delta and d=1d = 1: since τhδ=δh\tau_h\delta = \delta_h (Example 9), the Taylor formula gives

δh=n=0N(h)nn!δ(n)+RN.\delta_h = \sum_{n=0}^{N} \frac{(-h)^n}{n!}\, \delta^{(n)} + R_N.

Let us verify this directly. Applying both sides to a test function φ\varphi:

φ(h)=n=0N(h)nn!δ(n)(φ)+RN(φ)=n=0N(h)nn!(1)nφ(n)(0)+RN(φ)=n=0Nhnn!φ(n)(0)+RN(φ).\varphi(h) = \sum_{n=0}^{N} \frac{(-h)^n}{n!}\, \delta^{(n)}(\varphi) + R_N(\varphi) = \sum_{n=0}^{N} \frac{(-h)^n}{n!}\,(-1)^n\,\varphi^{(n)}(0) + R_N(\varphi) = \sum_{n=0}^{N} \frac{h^n}{n!}\,\varphi^{(n)}(0) + R_N(\varphi).

This is exactly the classical Taylor expansion of φ\varphi at 0, evaluated at hh. The distributional Taylor theorem for δ\delta is nothing but the classical Taylor theorem for the test function, read through the duality.

Weak Derivatives

Distributional derivatives exist for every distribution, but they may not be representable by a function. When they are, we get the notion of a weak derivative, which bridges distribution theory and Sobolev spaces.

Definition 3 (Weak derivative)

Let uLloc1(Ω)u \in L^1_{\mathrm{loc}}(\Omega). We say uu has a weak derivative Dαu=vD^\alpha u = v if there exists vLloc1(Ω)v \in L^1_{\mathrm{loc}}(\Omega) such that

ΩuDαφdx=(1)αΩvφdxfor all φD(Ω).\int_\Omega u\,D^\alpha \varphi\,dx = (-1)^{|\alpha|} \int_\Omega v\,\varphi\,dx \qquad \text{for all } \varphi \in \mathcal{D}(\Omega).

When it exists, the weak derivative is unique (up to a.e. equivalence).

In the language of adjoints: the distributional derivative DαTuD^\alpha T_u is the functional φ(1)αuDαφdx\varphi \mapsto (-1)^{|\alpha|} \int u\, D^\alpha\varphi\, dx. This always defines an element of D(Ω)\mathcal{D}'(\Omega). The question is whether this functional can be represented by a function vv via φvφdx\varphi \mapsto \int v\,\varphi\, dx. When it can, v=Dαuv = D^\alpha u is the weak derivative of uu. If moreover uLpu \in L^p and DαuLpD^\alpha u \in L^p, then uu belongs to a Sobolev space Wk,p(Ω)W^{k,p}(\Omega): the space of LpL^p functions whose weak derivatives up to order kk are also in LpL^p. Sobolev spaces are the natural home for PDE solutions that are not classically differentiable, and their theory rests entirely on the distinction between distributional and weak derivatives.

Example 6 (x|x| has a weak derivative)

By Example 3, the distributional derivative of x|x| is sgn(x)\operatorname{sgn}(x). Since sgnLloc1(R)\operatorname{sgn} \in L^1_{\mathrm{loc}}(\mathbb{R}), this is a genuine weak derivative: x=sgn(x)|x|' = \operatorname{sgn}(x) weakly.

Note that x|x| is not classically differentiable at the origin, but the single point {0}\{0\} has measure zero and does not affect the Lloc1L^1_{\mathrm{loc}} function sgn(x)\operatorname{sgn}(x).

Example 7 (The Heaviside function has no weak derivative in Lloc1L^1_{\mathrm{loc}})

By Example 2, H=δH' = \delta as distributions. But δ\delta is not a regular distribution: there is no vLloc1(R)v \in L^1_{\mathrm{loc}}(\mathbb{R}) with vφdx=φ(0)\int v\,\varphi\,dx = \varphi(0) for all test functions φ\varphi. So HH does not have a weak derivative.

This is why HLp(Ω)H \in L^p(\Omega) for any bounded Ω\Omega but HW1,p(Ω)H \notin W^{1,p}(\Omega) for any pp: membership in the Sobolev space requires the weak derivative to exist as a function in LpL^p.

Example 8 (Characteristic function of an interval)

The function u=1(a,b)u = \mathbf{1}_{(a,b)} has distributional derivative u=δaδbu' = \delta_a - \delta_b (a difference of point masses). This is a measure but not in Lloc1L^1_{\mathrm{loc}}, so uu has no weak derivative. Like the Heaviside function, uLpu \in L^p but uW1,pu \notin W^{1,p} for any pp.

The weak derivative extends the classical derivative in the following sense:

  • If ff is classically differentiable (or even just absolutely continuous), its weak derivative exists and agrees with the classical derivative a.e.

  • Weak differentiability allows corners and kinks (finitely many, or even countably many, as long as the derivative remains locally integrable).

  • Weak differentiability does not allow jumps: a jump discontinuity produces a delta function in the derivative, which is not in Lloc1L^1_{\mathrm{loc}}.

The dividing line is absolute continuity: a function uu on an interval has a weak derivative in Lloc1L^1_{\mathrm{loc}} if and only if uu is (equivalent to) an absolutely continuous function.

Translation and Reflection

The simplest operations on distributions illustrate the duality principle with no calculus required.

Recall that for functions on Rd\mathbb{R}^d, translation by hRdh \in \mathbb{R}^d is (τhφ)(x)=φ(xh)(\tau_h \varphi)(x) = \varphi(x - h) and reflection is φˇ(x)=φ(x)\check{\varphi}(x) = \varphi(-x). Given a distribution TD(Rd)T \in \mathcal{D}'(\mathbb{R}^d), we want to define the translated distribution τhT\tau_h T and the reflected distribution Tˇ\check{T}.

Definition 4 (Translation and reflection of distributions)

Let TD(Rd)T \in \mathcal{D}'(\mathbb{R}^d).

  1. Define Uh:D(Rd)RU_h : \mathcal{D}(\mathbb{R}^d) \to \mathbb{R} by Uh(φ)=T(τhφ)U_h(\varphi) = T(\tau_{-h}\varphi). We write τhT:=Uh\tau_h T := U_h and call it the translation of TT by hh.

  2. Define V:D(Rd)RV : \mathcal{D}(\mathbb{R}^d) \to \mathbb{R} by V(φ)=T(φˇ)V(\varphi) = T(\check{\varphi}). We write Tˇ:=V\check{T} := V and call it the reflection of TT.

Proposition 4 (Translation and reflection produce distributions)

τhT\tau_h T and Tˇ\check{T} are distributions, i.e. they belong to D(Rd)\mathcal{D}'(\mathbb{R}^d).

Proof 6

We verify the two requirements for Uh=τhTU_h = \tau_h T; the argument for V=TˇV = \check{T} is identical.

Linearity.

Uh(aφ+bψ)=T(τh(aφ+bψ))=T(aτhφ+bτhψ)=aT(τhφ)+bT(τhψ)=aUh(φ)+bUh(ψ).U_h(a\varphi + b\psi) = T(\tau_{-h}(a\varphi + b\psi)) = T(a\,\tau_{-h}\varphi + b\,\tau_{-h}\psi) = a\,T(\tau_{-h}\varphi) + b\,T(\tau_{-h}\psi) = a\,U_h(\varphi) + b\,U_h(\psi).

Continuity. If φnφ\varphi_n \to \varphi in D(Rd)\mathcal{D}(\mathbb{R}^d) (supports in a fixed compact KK, all derivatives converge uniformly), then τhφnτhφ\tau_{-h}\varphi_n \to \tau_{-h}\varphi in D(Rd)\mathcal{D}(\mathbb{R}^d) (supports in the fixed compact K+hK + h, all derivatives still converge uniformly). Since TT is continuous,

Uh(φn)=T(τhφn)T(τhφ)=Uh(φ).U_h(\varphi_n) = T(\tau_{-h}\varphi_n) \to T(\tau_{-h}\varphi) = U_h(\varphi).

Consistency with functions. When T=TfT = T_f for fLloc1f \in L^1_{\mathrm{loc}}, the distribution τhTf\tau_h T_f agrees with TτhfT_{\tau_h f}:

(τhTf)(φ)=Tf(τhφ)=f(x)φ(x+h)dx=f(yh)φ(y)dy=Tτhf(φ).(\tau_h T_f)(\varphi) = T_f(\tau_{-h}\varphi) = \int f(x)\,\varphi(x+h)\,dx = \int f(y-h)\,\varphi(y)\,dy = T_{\tau_h f}(\varphi).

So the distributional definition extends the classical one.

Example 9 (Translation of the Dirac delta)

(τhδ)(φ)=δ(τhφ)=(τhφ)(0)=φ(h)=δh(φ)(\tau_h \delta)(\varphi) = \delta(\tau_{-h}\varphi) = (\tau_{-h}\varphi)(0) = \varphi(h) = \delta_h(\varphi). So τhδ=δh\tau_h \delta = \delta_h: translating the delta moves the point of evaluation.

Convolution

Convolution with a test function is a smoothing operation: it always produces a smooth function, even when applied to a distribution. The definition uses translation and reflection from the preceding section.

Convolution of functions (review)

For fLloc1(Rd)f \in L^1_{\mathrm{loc}}(\mathbb{R}^d) and ψD(Rd)\psi \in \mathcal{D}(\mathbb{R}^d):

(fψ)(x)=Rdf(y)ψ(xy)dy.(f * \psi)(x) = \int_{\mathbb{R}^d} f(y)\,\psi(x - y)\,dy.

The result fψf * \psi is smooth (CC^\infty) and satisfies Dα(fψ)=fDαψD^\alpha(f * \psi) = f * D^\alpha \psi.

Convolution of a distribution with a test function

Definition 5 (Convolution with a test function)

Let TD(Rd)T \in \mathcal{D}'(\mathbb{R}^d) and ψD(Rd)\psi \in \mathcal{D}(\mathbb{R}^d). The convolution TψT * \psi is the function

(Tψ)(x)=T,τxψˇ=Ty,ψ(xy)(T * \psi)(x) = \langle T, \tau_x\check{\psi} \rangle = \langle T_y, \psi(x - y) \rangle

where τx\tau_x is translation (Definition 4), ψˇ\check{\psi} is reflection, and TT acts on the yy variable.

When T=TfT = T_f for a locally integrable function ff, this recovers the usual convolution: (Tfψ)(x)=f(y)ψ(xy)dy=(fψ)(x)(T_f * \psi)(x) = \int f(y)\,\psi(x-y)\,dy = (f * \psi)(x).

Theorem 3 (Regularization by convolution)

Let TD(Rd)T \in \mathcal{D}'(\mathbb{R}^d) and ψD(Rd)\psi \in \mathcal{D}(\mathbb{R}^d). Then:

  1. TψC(Rd)T * \psi \in C^\infty(\mathbb{R}^d): the result is always a smooth function.

  2. Dα(Tψ)=(DαT)ψ=T(Dαψ)D^\alpha(T * \psi) = (D^\alpha T) * \psi = T * (D^\alpha \psi).

  3. supp(Tψ)supp(T)+supp(ψ)\operatorname{supp}(T * \psi) \subseteq \operatorname{supp}(T) + \operatorname{supp}(\psi) (Minkowski sum).

Proof 7

Smoothness. Fix x0Rdx_0 \in \mathbb{R}^d and consider the map xτxψˇx \mapsto \tau_x\check{\psi} from Rd\mathbb{R}^d into D(Rd)\mathcal{D}(\mathbb{R}^d). This map is smooth: for any direction eie_i,

limh0τx0+heiψˇτx0ψˇh=i[τx0ψˇ]\lim_{h \to 0} \frac{\tau_{x_0 + he_i}\check{\psi} - \tau_{x_0}\check{\psi}}{h} = \partial_i[\tau_{x_0}\check{\psi}]

and the convergence holds in D(Rd)\mathcal{D}(\mathbb{R}^d) (all supports stay in a fixed compact set, all derivatives converge uniformly). Applying the continuous linear functional TT:

i(Tψ)(x0)=limh0T,τx0+heiψˇT,τx0ψˇh=T,i[τx0ψˇ].\partial_i(T * \psi)(x_0) = \lim_{h \to 0} \frac{\langle T, \tau_{x_0 + he_i}\check{\psi}\rangle - \langle T, \tau_{x_0}\check{\psi} \rangle}{h} = \langle T, \partial_i[\tau_{x_0}\check{\psi}] \rangle.

Iterating gives smoothness to all orders.

Interchange of derivative and convolution. From the proof above, Dα(Tψ)(x)=T,Dxαψ(x)=(TDαψ)(x)D^\alpha(T * \psi)(x) = \langle T, D^\alpha_x \psi(x - \cdot) \rangle = (T * D^\alpha \psi)(x). For the other identity, DαT,ψ(x)=(1)αT,Dyαψ(xy)=T,Dxαψ(x)\langle D^\alpha T, \psi(x - \cdot) \rangle = (-1)^{|\alpha|} \langle T, D^\alpha_y \psi(x - y) \rangle = \langle T, D^\alpha_x \psi(x - \cdot) \rangle, using the chain rule Dyαψ(xy)=(1)αDxαψ(xy)D^\alpha_y \psi(x - y) = (-1)^{|\alpha|} D^\alpha_x \psi(x - y).

Approximation to the identity

Definition 6 (Mollifier)

A mollifier (or approximation to the identity) is a family {ρε}ε>0\{\rho_\varepsilon\}_{\varepsilon > 0} of test functions satisfying:

  1. ρε0\rho_\varepsilon \geq 0,

  2. ρε=1\int \rho_\varepsilon = 1,

  3. supp(ρε)B(0,ε)\operatorname{supp}(\rho_\varepsilon) \subseteq B(0, \varepsilon).

The standard choice is ρε(x)=εdρ(x/ε)\rho_\varepsilon(x) = \varepsilon^{-d}\rho(x/\varepsilon) where ρ\rho is a fixed non-negative test function with ρ=1\int \rho = 1 supported in the unit ball.

Theorem 4 (Mollification approximates distributions)

Let TD(Rd)T \in \mathcal{D}'(\mathbb{R}^d) and {ρε}\{\rho_\varepsilon\} a mollifier. Then:

  1. Each TρεT * \rho_\varepsilon is a smooth function.

  2. TρεTT * \rho_\varepsilon \to T in D(Rd)\mathcal{D}'(\mathbb{R}^d) as ε0\varepsilon \to 0.

  3. If T=TfT = T_f for fLp(Rd)f \in L^p(\mathbb{R}^d), 1p<1 \leq p < \infty, then fρεff * \rho_\varepsilon \to f in LpL^p.

This is one of the most useful results in analysis: every distribution can be approximated by smooth functions. This is the distributional analogue of sequential density (Proposition 3), made constructive.