The Calculus of Distributions

Big Idea

Every operation on distributions is defined by duality: to apply an operation to a distribution $T$ , transfer the adjoint operation to the test function. Differentiation becomes the integration-by-parts formula, and the payoff is immediate: every distribution is infinitely differentiable.

The key idea throughout this section is the same: if an operation $A$ on smooth functions satisfies $\langle Af, \varphi \rangle = \langle f, A^\dagger \varphi \rangle$ for some adjoint $A^\dagger$ that maps test functions to test functions, then we define $\langle AT, \varphi \rangle = \langle T, A^\dagger \varphi \rangle$ for any distribution $T$ .

A note on notation. We have a functional $T : \mathcal{D}(\Omega) \to \mathbb{R}$ , and we want to define a new functional $AT$ . The definition always acts on the test function: $AT$ is the composition $T \circ A^\dagger$ , and $T$ itself is unchanged. Concretely:

$D^\alpha T = (-1)^{|\alpha|}\, T \circ D^\alpha$ , i.e. $\varphi \mapsto (-1)^{|\alpha|} T(D^\alpha \varphi)$ ,
$fT = T \circ M_f$ , i.e. $\varphi \mapsto T(f\varphi)$ , where $M_f$ is the multiplication operator,
$\tau_h T = T \circ \tau_{-h}$ , i.e. $\varphi \mapsto T(\tau_{-h}\varphi)$ .

In each case the notation ( $D^\alpha T$ , $fT$ , $\tau_h T$ ) suggests we are acting on $T$ , but the operation is transferred to $\varphi$ via the adjoint. This is the adjoint construction from the duality chapter (Definition 1) applied concretely: if $A : \mathcal{D}(\Omega) \to \mathcal{D}(\Omega)$ is a continuous linear map, its adjoint $A^* : \mathcal{D}'(\Omega) \to \mathcal{D}'(\Omega)$ acts by $A^* T = T \circ A$ . Differentiation, multiplication, and translation on distributions are all instances of this. The notation is justified by consistency: when $T = T_f$ for a classical function $f$ , the distribution $AT$ agrees with the distribution $T_{Af}$ defined by applying $A$ to $f$ directly.

Multiplication by Smooth Functions¶

Given $f \in C^\infty(\Omega)$ and $T \in \mathcal{D}'(\Omega)$ , we want to define a new distribution that deserves the name “ $fT$ .”

Definition 1 (Multiplication by a smooth function)

Let $f \in C^\infty(\Omega)$ and $T \in \mathcal{D}'(\Omega)$ . Define $U : \mathcal{D}(\Omega) \to \mathbb{R}$ by

U(\varphi) = T(f\varphi).

(1)

We write $fT := U$ and call it the product of $f$ and $T$ .

Note that $U$ is well-defined as a map: $f\varphi \in \mathcal{D}(\Omega)$ whenever $\varphi \in \mathcal{D}(\Omega)$ , because the product of a smooth function and a compactly supported smooth function is again compactly supported and smooth. So $T(f\varphi)$ makes sense.

Proposition 1 (Multiplication produces a distribution)

$fT \in \mathcal{D}'(\Omega)$ .

Proof 1

Remark 1

Example 1 (Multiplying the Dirac delta)

Remark 2

Differentiation of Distributions¶

With the duality principle now familiar from translation and multiplication, we come to the main payoff: differentiation.

The definition¶

The definition is motivated by integration by parts. Suppose $f$ is smooth. Then for every $\varphi \in \mathcal{D}(\Omega)$ ,

\int_\Omega f'(x)\,\varphi(x)\,dx = -\int_\Omega f(x)\,\varphi'(x)\,dx

(6)

since the boundary terms vanish (test functions have compact support in $\Omega$ ). In the language of distributions: $T_{f'}(\varphi) = -T_f(\varphi')$ . The right-hand side makes sense for any distribution $T$ , not just those coming from smooth functions: given $T \in \mathcal{D}'(\Omega)$ , the map $\varphi \mapsto -T(\varphi')$ is perfectly well-defined. This suggests a definition.

Definition 2 (Distributional derivative)

Let $T \in \mathcal{D}'(\Omega)$ and $\alpha$ a multi-index. Define $U : \mathcal{D}(\Omega) \to \mathbb{R}$ by

U(\varphi) = (-1)^{|\alpha|}\, T(D^\alpha \varphi).

(7)

We write $D^\alpha T := U$ and call it the distributional derivative of $T$ .

Theorem 1 (Every distribution is infinitely differentiable)

For any $T \in \mathcal{D}'(\Omega)$ and any multi-index $\alpha$ , $D^\alpha T \in \mathcal{D}'(\Omega)$ . In particular, every distribution can be differentiated infinitely many times.

Proof 2

Remark 3

This is the main payoff of the distributional framework: differentiation is always possible, and it always produces another distribution.

The differentiation cascade¶

The power of distributional derivatives is best seen through a chain of examples in one dimension ( $\Omega = \mathbb{R}$ ).

Example 2 (Heaviside to Dirac)

Example 3 (The $|x|$ cascade)

Example 4 (Differentiating $\log|x|$ )

Properties of distributional differentiation¶

Proposition 2 (Basic properties)

Let $S, T \in \mathcal{D}'(\Omega)$ , $\alpha, \beta$ multi-indices, $c \in \mathbb{R}$ .

Linearity: $D^\alpha(cS + T) = cD^\alpha S + D^\alpha T$ .
Commutativity of mixed partials: $D^\alpha D^\beta T = D^\beta D^\alpha T = D^{\alpha + \beta} T$ .
Consistency: If $f \in C^{|\alpha|}(\Omega)$ , the distributional derivative $D^\alpha T_f$ agrees with the classical derivative $T_{D^\alpha f}$ .
Continuity: If $T_n \to T$ in $\mathcal{D}'(\Omega)$ , then $D^\alpha T_n \to D^\alpha T$ in $\mathcal{D}'(\Omega)$ .

Proof 3

Property 4 is remarkable: you can always interchange limits and derivatives in the sense of distributions. In classical analysis, this requires uniform convergence of derivatives. In the distributional framework, pointwise convergence of the functionals is enough, because the derivative is transferred to the (fixed, smooth) test function.

Proposition 3 (Leibniz rule for distributions)

For $f \in C^\infty(\Omega)$ and $T \in \mathcal{D}'(\Omega)$ :

D(fT) = f'T + fT'

(18)

and more generally, for any multi-index $\alpha$ :

D^\alpha(fT) = \sum_{\beta \leq \alpha} \binom{\alpha}{\beta} D^\beta f \cdot D^{\alpha - \beta} T.

(19)

Proof 4

Taylor’s theorem for distributions¶

Now that we have both translation and differentiation, we can combine them into a distributional Taylor expansion. The idea is simple: to expand $\tau_h T$ in powers of $h$ , apply $\tau_h T$ to a test function and use the classical Taylor expansion of $\varphi(x + h)$ in $h$ .

Theorem 2 (Taylor’s theorem for distributions)

Let $T \in \mathcal{D}'(\mathbb{R}^d)$ and $h \in \mathbb{R}^d$ . Then for every $N \geq 0$ ,

\tau_h T = \sum_{|\alpha| \leq N} \frac{(-h)^\alpha}{\alpha!}\, D^\alpha T + R_N

(22)

where the remainder $R_N \in \mathcal{D}'(\mathbb{R}^d)$ satisfies

R_N(\varphi) = T\!\left( \sum_{|\alpha| = N+1} \frac{h^\alpha}{\alpha!} \int_0^1 (1-t)^N\, (D^\alpha\varphi)(\cdot + th)\, (N+1)\,dt \right)

(23)

for all $\varphi \in \mathcal{D}(\mathbb{R}^d)$ .

Proof 5

Example 5 (Taylor expansion of $\delta_h$ )

Weak Derivatives¶

Distributional derivatives exist for every distribution, but they may not be representable by a function. When they are, we get the notion of a weak derivative, which bridges distribution theory and Sobolev spaces.

Definition 3 (Weak derivative)

Let $u \in L^1_{\mathrm{loc}}(\Omega)$ . We say $u$ has a weak derivative $D^\alpha u = v$ if there exists $v \in L^1_{\mathrm{loc}}(\Omega)$ such that

\int_\Omega u\,D^\alpha \varphi\,dx = (-1)^{|\alpha|} \int_\Omega v\,\varphi\,dx \qquad \text{for all } \varphi \in \mathcal{D}(\Omega).

(29)

When it exists, the weak derivative is unique (up to a.e. equivalence).

In the language of adjoints: the distributional derivative $D^\alpha T_u$ is the functional $\varphi \mapsto (-1)^{|\alpha|} \int u\, D^\alpha\varphi\, dx$ . This always defines an element of $\mathcal{D}'(\Omega)$ . The question is whether this functional can be represented by a function $v$ via $\varphi \mapsto \int v\,\varphi\, dx$ . When it can, $v = D^\alpha u$ is the weak derivative of $u$ . If moreover $u \in L^p$ and $D^\alpha u \in L^p$ , then $u$ belongs to a Sobolev space $W^{k,p}(\Omega)$ : the space of $L^p$ functions whose weak derivatives up to order $k$ are also in $L^p$ . Sobolev spaces are the natural home for PDE solutions that are not classically differentiable, and their theory rests entirely on the distinction between distributional and weak derivatives.

Example 6 ( $|x|$ has a weak derivative)

Example 7 (The Heaviside function has no weak derivative in $L^1_{\mathrm{loc}}$ )

Example 8 (Characteristic function of an interval)

Remark 4

Translation and Reflection¶

The simplest operations on distributions illustrate the duality principle with no calculus required.

Recall that for functions on $\mathbb{R}^d$ , translation by $h \in \mathbb{R}^d$ is $(\tau_h \varphi)(x) = \varphi(x - h)$ and reflection is $\check{\varphi}(x) = \varphi(-x)$ . Given a distribution $T \in \mathcal{D}'(\mathbb{R}^d)$ , we want to define the translated distribution $\tau_h T$ and the reflected distribution $\check{T}$ .

Definition 4 (Translation and reflection of distributions)

Let $T \in \mathcal{D}'(\mathbb{R}^d)$ .

Define $U_h : \mathcal{D}(\mathbb{R}^d) \to \mathbb{R}$ by $U_h(\varphi) = T(\tau_{-h}\varphi)$ . We write $\tau_h T := U_h$ and call it the translation of $T$ by $h$ .
Define $V : \mathcal{D}(\mathbb{R}^d) \to \mathbb{R}$ by $V(\varphi) = T(\check{\varphi})$ . We write $\check{T} := V$ and call it the reflection of $T$ .

Proposition 4 (Translation and reflection produce distributions)

$\tau_h T$ and $\check{T}$ are distributions, i.e. they belong to $\mathcal{D}'(\mathbb{R}^d)$ .

Proof 6

Remark 5

Example 9 (Translation of the Dirac delta)

Convolution¶

Convolution with a test function is a smoothing operation: it always produces a smooth function, even when applied to a distribution. The definition uses translation and reflection from the preceding section.

Convolution of functions (review)¶

For $f \in L^1_{\mathrm{loc}}(\mathbb{R}^d)$ and $\psi \in \mathcal{D}(\mathbb{R}^d)$ :

(f * \psi)(x) = \int_{\mathbb{R}^d} f(y)\,\psi(x - y)\,dy.

(33)

The result $f * \psi$ is smooth ( $C^\infty$ ) and satisfies $D^\alpha(f * \psi) = f * D^\alpha \psi$ .

Convolution of a distribution with a test function¶

Definition 5 (Convolution with a test function)

Let $T \in \mathcal{D}'(\mathbb{R}^d)$ and $\psi \in \mathcal{D}(\mathbb{R}^d)$ . The convolution $T * \psi$ is the function

(T * \psi)(x) = \langle T, \tau_x\check{\psi} \rangle = \langle T_y, \psi(x - y) \rangle

(34)

where $\tau_x$ is translation (Definition 4), $\check{\psi}$ is reflection, and $T$ acts on the $y$ variable.

When $T = T_f$ for a locally integrable function $f$ , this recovers the usual convolution: $(T_f * \psi)(x) = \int f(y)\,\psi(x-y)\,dy = (f * \psi)(x)$ .

Theorem 3 (Regularization by convolution)

Let $T \in \mathcal{D}'(\mathbb{R}^d)$ and $\psi \in \mathcal{D}(\mathbb{R}^d)$ . Then:

$T * \psi \in C^\infty(\mathbb{R}^d)$ : the result is always a smooth function.
$D^\alpha(T * \psi) = (D^\alpha T) * \psi = T * (D^\alpha \psi)$ .
$\operatorname{supp}(T * \psi) \subseteq \operatorname{supp}(T) + \operatorname{supp}(\psi)$ (Minkowski sum).

Proof 7

Approximation to the identity¶

Definition 6 (Mollifier)

A mollifier (or approximation to the identity) is a family $\{\rho_\varepsilon\}_{\varepsilon > 0}$ of test functions satisfying:

$\rho_\varepsilon \geq 0$ ,
$\int \rho_\varepsilon = 1$ ,
$\operatorname{supp}(\rho_\varepsilon) \subseteq B(0, \varepsilon)$ .

The standard choice is $\rho_\varepsilon(x) = \varepsilon^{-d}\rho(x/\varepsilon)$ where $\rho$ is a fixed non-negative test function with $\int \rho = 1$ supported in the unit ball.

Theorem 4 (Mollification approximates distributions)

Let $T \in \mathcal{D}'(\mathbb{R}^d)$ and $\{\rho_\varepsilon\}$ a mollifier. Then:

Each $T * \rho_\varepsilon$ is a smooth function.
$T * \rho_\varepsilon \to T$ in $\mathcal{D}'(\mathbb{R}^d)$ as $\varepsilon \to 0$ .
If $T = T_f$ for $f \in L^p(\mathbb{R}^d)$ , $1 \leq p < \infty$ , then $f * \rho_\varepsilon \to f$ in $L^p$ .

This is one of the most useful results in analysis: every distribution can be approximated by smooth functions. This is the distributional analogue of sequential density (Proposition 3), made constructive.