A central theme in functional analysis is approximation: given a function in some large space, can we approximate it arbitrarily well by functions from a nicer, more structured class? This question is formalized through the notions of density and separability.
Dense Subsets¶
Definition 1 (Dense Subset)
A subset is dense in if . Equivalently, given , such that .
Example 1 (Rationals in Reals)
. Let and then there is such that . Define which implies that
In other words we can approximate any real number arbitrarily well by elements in the rationals e.g. the rationals are dense in the reals. This property of the rationals is crucially important for finite arithmetic approximations of the reals on computers.
Example 2 (Weierstrass Theorem)
Let and let then there exists a polynomial such that .
Separability¶
Definition 2 (Separable Space)
A Banach space which contains a dense countable subset is called separable.
Example 3 (Separability of Real Numbers)
Since the rationals are countable and dense in , thus is separable.
Example 4 (Separability of Continuous Functions)
has the dense subset
This set is countable and dense in : we can approximate any continuous function using a polynomial having rational coefficients. Thus is a separable Banach space.
Example 5 (Separability of Square Integrable Functions)
The space of square integrable functions has basis set
which is countable and hence it is a separable space.
Example 6 (Non-separability of L-infinity)
is not separable (see homework).
Mollification¶
The key technique for approximating rough functions by smooth ones is mollification — convolution with a smooth bump function that averages a function over a small neighborhood.
Theorem 1 (Mollifiers Theorem)
Given , for each such that
This is a powerful result: it tells us that any continuous function with compact support can be uniformly approximated by smooth, compactly supported functions. The idea is to convolve with a smooth bump (a mollifier) at a sufficiently small scale.
Density of Smooth Functions in ¶
Mollification is the engine behind the fundamental density results for spaces.
Theorem 2 (Density of in )
Let be bounded, then is dense in i.e.
where the closure is with respect to the -norm, and is separable.
The key to the proof is to recall that any measurable function can be approximated using simple functions. Recall that we say a function is simple if
where is the indicator function of a set, recall that from the construction of the Lebesgue integral these sets can be complicated (i.e. contain several disjointed pieces). Then given any measurable function we can find a sequence of increasing simple functions (i.e. ) such that pointwise for almost every i.e. .
We have that and the Lebesgue dominated convergence theorem implies that
Thus we have that . Finally note that each simple function is continuous and we can pick the weights to be rational, and for the final step mollify each of the characteristic functions.
Combining with the mollifiers theorem, we obtain the density of smooth functions.
Corollary 1 (Density of in )
Let be bounded. Then is dense in :
Approximation Beyond Polynomials¶
The classical approximation results above — Weierstrass, mollification, density of in — all share a common structure: they identify a “nice” class of functions (polynomials, smooth functions) that can approximate arbitrary elements of a larger space.
Remark 1 (Toward Neural Network Approximation)
A natural question is: what other classes of functions are dense in common function spaces? It turns out that neural networks provide a modern and remarkably powerful answer. The Universal Approximation Theorem (Cybenko, 1989; Hornik, 1991) shows that single-hidden-layer neural networks with sufficiently many neurons are dense in for any compact , and consequently in .
From the functional-analytic viewpoint, this is a density result: the set of functions representable by a neural network architecture is dense in the spaces we care about. The proof, which we will see later in the course, is a beautiful application of the Hahn-Banach theorem — one of the central results in duality theory. We will develop this connection in detail in the chapter on Neural Network Connections.