Every operation on distributions is defined by duality: to apply an
operation to a distribution T, transfer the adjoint operation to the test
function. Differentiation becomes the integration-by-parts formula, and the
payoff is immediate: every distribution is infinitely differentiable.
The key idea throughout this section is the same: if an operation A on
smooth functions satisfies ⟨Af,φ⟩=⟨f,A†φ⟩ for some adjoint A† that maps test
functions to test functions, then we define⟨AT,φ⟩=⟨T,A†φ⟩ for any distribution T.
A note on notation. We have a functional
T:D(Ω)→R, and we want to define a new
functional AT. The definition always acts on the test function: AT is
the composition T∘A†, and T itself is unchanged.
Concretely:
DαT=(−1)∣α∣T∘Dα, i.e.
φ↦(−1)∣α∣T(Dαφ),
fT=T∘Mf, i.e. φ↦T(fφ), where Mf
is the multiplication operator,
τhT=T∘τ−h, i.e.
φ↦T(τ−hφ).
In each case the notation (DαT, fT, τhT) suggests we are
acting on T, but the operation is transferred to φ via the
adjoint. This is the adjoint construction from the duality chapter
(Definition 1) applied concretely: if
A:D(Ω)→D(Ω) is a continuous linear
map, its adjoint A∗:D′(Ω)→D′(Ω)
acts by A∗T=T∘A. Differentiation, multiplication, and
translation on distributions are all instances of this. The notation is
justified by consistency: when T=Tf for a classical function f, the
distribution AT agrees with the distribution TAf defined by applying
A to f directly.
We write fT:=U and call it the product of f and T.
Note that U is well-defined as a map: fφ∈D(Ω)
whenever φ∈D(Ω), because the product of a smooth
function and a compactly supported smooth function is again compactly
supported and smooth. So T(fφ) makes sense.
Proposition 1 (Multiplication produces a distribution)
Continuity. Suppose φn→φ in D(Ω)
(supports in a fixed compact K, all derivatives converge uniformly).
Then fφn→fφ in D(Ω): the supports
remain in K, and the Leibniz rule gives
We cannot multiply two arbitrary distributions. The construction above
requires f to be smooth (or at least sufficiently regular) so that
φ↦fφ maps D(Ω) to itself. The
difficulty of multiplying distributions is a fundamental limitation: it is
related to the problem of renormalization in quantum field theory and was
one of the motivations for Colombeau’s theory of generalized functions.
since the boundary terms vanish (test functions have compact support in
Ω). In the language of distributions:
Tf′(φ)=−Tf(φ′). The right-hand side makes sense for
any distribution T, not just those coming from smooth functions: given
T∈D′(Ω), the map φ↦−T(φ′) is
perfectly well-defined. This suggests a definition.
Continuity. If φn→φ in D(Ω)
(supports in a fixed compact K, all derivatives converge uniformly),
then Dαφn→Dαφ in D(Ω)
(same compact K, all derivatives still converge uniformly — differentiating
does not enlarge supports). Since T is continuous,
Since U is linear and continuous, U∈D′(Ω). As T
and α were arbitrary, we can differentiate again: Dβ(DαT)
is again a distribution, and so on indefinitely.
Each step is well-defined as a distributional derivative. The first step
produces a discontinuous function, the second a measure, and the third an
object that is not even a measure, but all are distributions.
The function log∣x∣ is locally integrable on R (the
singularity at the origin is integrable). Its distributional derivative is
the principal value distribution:
where the last equality follows from integration by parts on
(−∞,−ε) and (ε,∞) and observing that
the boundary terms logε[φ(ε)−φ(−ε)]→0 since φ is smooth.
Commutativity of mixed partials:DαDβT=DβDαT=Dα+βT.
Consistency: If f∈C∣α∣(Ω), the distributional
derivative DαTf agrees with the classical derivative
TDαf.
Continuity: If Tn→T in D′(Ω), then
DαTn→DαT in D′(Ω).
Proof 3
All four follow directly from the definition ⟨DαT,φ⟩=(−1)∣α∣⟨T,Dαφ⟩.
Linearity of T in the pairing.
⟨DαDβT,φ⟩=(−1)∣α∣⟨DβT,Dαφ⟩=(−1)∣α∣+∣β∣⟨T,Dα+βφ⟩, which is symmetric in
α,β since partial derivatives of smooth functions commute.
For f∈C∣α∣, integration by parts gives (−1)∣α∣∫fDαφ=∫(Dαf)φ.
If ⟨Tn,ψ⟩→⟨T,ψ⟩ for all
ψ∈D, then in particular for ψ=Dαφ: ⟨DαTn,φ⟩=(−1)∣α∣⟨Tn,Dαφ⟩→(−1)∣α∣⟨T,Dαφ⟩=⟨DαT,φ⟩.
Property 4 is remarkable: you can always interchange limits and
derivatives in the sense of distributions. In classical analysis, this
requires uniform convergence of derivatives. In the distributional
framework, pointwise convergence of the functionals is enough, because the
derivative is transferred to the (fixed, smooth) test function.
Now that we have both translation and differentiation, we can combine them
into a distributional Taylor expansion. The idea is simple: to expand
τhT in powers of h, apply τhT to a test function and use
the classical Taylor expansion of φ(x+h) in h.
This expansion, and all its x-derivatives, converge uniformly on compact
sets (the remainder and its derivatives are controlled by finitely many
derivatives of φ on a compact set containing
supp(φ)+[0,1]⋅h). So the expansion holds in
D(Rd). Applying T:
This is exactly the classical Taylor expansion of φ at 0,
evaluated at h. The distributional Taylor theorem for δ is
nothing but the classical Taylor theorem for the test function, read
through the duality.
Distributional derivatives exist for every distribution, but they may not
be representable by a function. When they are, we get the notion of a
weak derivative, which bridges distribution theory and Sobolev spaces.
When it exists, the weak derivative is unique (up to a.e. equivalence).
In the language of adjoints: the distributional derivative
DαTu is the functional
φ↦(−1)∣α∣∫uDαφdx. This
always defines an element of D′(Ω). The question is
whether this functional can be represented by a function v via
φ↦∫vφdx. When it can, v=Dαu is the weak derivative of u. If moreover
u∈Lp and Dαu∈Lp, then u belongs to a Sobolev
spaceWk,p(Ω): the space of Lp functions whose weak
derivatives up to order k are also in Lp. Sobolev spaces are the natural home for
PDE solutions that are not classically differentiable, and their theory
rests entirely on the distinction between distributional and weak
derivatives.
By Example 3, the distributional derivative of ∣x∣ is
sgn(x). Since sgn∈Lloc1(R), this is a genuine weak derivative:
∣x∣′=sgn(x) weakly.
Note that ∣x∣ is not classically differentiable at the origin, but the
single point {0} has measure zero and does not affect the Lloc1
function sgn(x).
Example 7 (The Heaviside function has no weak derivative in Lloc1)
By Example 2, H′=δ as distributions. But
δ is not a regular distribution: there is no v∈Lloc1(R) with ∫vφdx=φ(0)
for all test functions φ. So H does not have a weak
derivative.
This is why H∈Lp(Ω) for any bounded Ω but H∈/W1,p(Ω) for any p: membership in the Sobolev space requires the
weak derivative to exist as a function in Lp.
Example 8 (Characteristic function of an interval)
The function u=1(a,b) has distributional derivative
u′=δa−δb (a difference of point masses). This is a
measure but not in Lloc1, so u has no weak derivative.
Like the Heaviside function, u∈Lp but u∈/W1,p for any
p.
The weak derivative extends the classical derivative in the following sense:
If f is classically differentiable (or even just absolutely continuous),
its weak derivative exists and agrees with the classical derivative a.e.
Weak differentiability allows corners and kinks (finitely many, or even
countably many, as long as the derivative remains locally integrable).
Weak differentiability does not allow jumps: a jump discontinuity
produces a delta function in the derivative, which is not in
Lloc1.
The dividing line is absolute continuity: a function u on an interval
has a weak derivative in Lloc1 if and only if u is
(equivalent to) an absolutely continuous function.
The simplest operations on distributions illustrate the duality principle
with no calculus required.
Recall that for functions on Rd, translation by
h∈Rd is (τhφ)(x)=φ(x−h) and
reflection is φˇ(x)=φ(−x). Given a distribution
T∈D′(Rd), we want to define the translated
distribution τhT and the reflected distribution Tˇ.
Definition 4 (Translation and reflection of distributions)
Let T∈D′(Rd).
Define Uh:D(Rd)→R by
Uh(φ)=T(τ−hφ). We write τhT:=Uh and
call it the translation of T by h.
Define V:D(Rd)→R by
V(φ)=T(φˇ). We write Tˇ:=V and call
it the reflection of T.
Proposition 4 (Translation and reflection produce distributions)
τhT and Tˇ are distributions, i.e. they belong to
D′(Rd).
Proof 6
We verify the two requirements for Uh=τhT; the argument for
V=Tˇ is identical.
Continuity. If φn→φ in D(Rd)
(supports in a fixed compact K, all derivatives converge uniformly),
then τ−hφn→τ−hφ in
D(Rd) (supports in the fixed compact K+h, all
derivatives still converge uniformly). Since T is continuous,
Convolution with a test function is a smoothing operation: it always
produces a smooth function, even when applied to a distribution. The
definition uses translation and reflection from the preceding section.
and the convergence holds in D(Rd) (all supports
stay in a fixed compact set, all derivatives converge uniformly).
Applying the continuous linear functional T:
Interchange of derivative and convolution. From the proof above,
Dα(T∗ψ)(x)=⟨T,Dxαψ(x−⋅)⟩=(T∗Dαψ)(x). For the other identity, ⟨DαT,ψ(x−⋅)⟩=(−1)∣α∣⟨T,Dyαψ(x−y)⟩=⟨T,Dxαψ(x−⋅)⟩, using the
chain rule Dyαψ(x−y)=(−1)∣α∣Dxαψ(x−y).
This is one of the most useful results in analysis: every distribution
can be approximated by smooth functions. This is the distributional
analogue of sequential density (Proposition 3), made
constructive.