Density and Approximation
A central theme in functional analysis is approximation : given a function in some large space, can we approximate it arbitrarily well by functions from a nicer, more structured class? This question is formalized through the notions of density and separability.
Dense Subsets ¶ A subset Y ⊂ X Y\subset X Y ⊂ X is dense in X X X if Y ˉ = X \bar{Y} = X Y ˉ = X . Equivalently, given x ∈ X x\in X x ∈ X , ∀ ε > 0 \forall \varepsilon > 0 ∀ ε > 0 ∃ y ∈ Y \exists y \in Y ∃ y ∈ Y such that ∥ x − y ∥ < ε \|x - y\| < \varepsilon ∥ x − y ∥ < ε .
Q ˉ = R \bar{\mathbb{Q}} = \mathbb{R} Q ˉ = R . Let r ∈ R r \in \mathbb{R} r ∈ R and ε > 0 \varepsilon > 0 ε > 0 then there is n n n such that 1 n < ε \frac{1}{n} < \varepsilon n 1 < ε . Define q n = ⌊ n r ⌋ n q_n = \frac{\lfloor nr\rfloor}{n} q n = n ⌊ n r ⌋ which implies that
∥ r − n r n ∥ < 1 n < ε \|r - \frac{nr}{n}\| < \frac{1}{n} < \varepsilon ∥ r − n n r ∥ < n 1 < ε
In other words we can approximate any real number r r r arbitrarily well by elements in the rationals e.g. the rationals are dense in the reals. This property of the rationals is crucially important for finite arithmetic approximations of the reals on computers.
Let f ∈ C 0 ( [ − 1 , 1 ] ) f \in \mathcal{C}^0([-1, 1]) f ∈ C 0 ([ − 1 , 1 ]) and let ε > 0 \varepsilon > 0 ε > 0 then there exists a polynomial p p p such that ∥ f − p ∥ ∞ < ε \|f - p\|_{\infty} < \varepsilon ∥ f − p ∥ ∞ < ε .
Separability ¶ C 0 ( [ − 1 , 1 ] ) \mathcal{C}^0([-1, 1]) C 0 ([ − 1 , 1 ]) has the dense subset
{ a x n : a ∈ Q , n ∈ N ∪ { 0 } } \{ ax^n : a \in \mathbb{Q},\ n \in \mathbb{N} \cup \{ 0 \} \} { a x n : a ∈ Q , n ∈ N ∪ { 0 }}
This set is countable and dense in C 0 ( [ − 1 , 1 ] ) \mathcal{C}^0([-1, 1]) C 0 ([ − 1 , 1 ]) : we can approximate any continuous function using a polynomial having rational coefficients. Thus C 0 ( [ − 1 , 1 ] ) \mathcal{C}^0([-1, 1]) C 0 ([ − 1 , 1 ]) is a separable Banach space.
The space of square integrable functions L 2 ( [ − 1 , 1 ] ) L^2([-1, 1]) L 2 ([ − 1 , 1 ]) has basis set
{ 1 , cos ( n x ) , sin ( n x ) } \{ 1,\ \cos(nx),\ \sin(nx)\} { 1 , cos ( n x ) , sin ( n x )}
which is countable and hence it is a separable space.
Mollification ¶ The key technique for approximating rough functions by smooth ones is mollification — convolution with a smooth bump function that averages a function over a small neighborhood.
This is a powerful result: it tells us that any continuous function with compact support can be uniformly approximated by smooth, compactly supported functions. The idea is to convolve f f f with a smooth bump (a mollifier) at a sufficiently small scale.
Density of Smooth Functions in L p L^p L p ¶ Mollification is the engine behind the fundamental density results for L p L^p L p spaces.
Let Ω \Omega Ω be bounded, then C 0 ( Ω ) \mathcal{C}^0(\Omega) C 0 ( Ω ) is dense in L p ( Ω ) L^p(\Omega) L p ( Ω ) i.e.
L p ( Ω ) = C c 0 ( Ω ) ‾ L^p(\Omega) = \overline{\mathcal{C}^0_c(\Omega)} L p ( Ω ) = C c 0 ( Ω ) where the closure is with respect to the p p p -norm, and L p L^p L p is separable.
The key to the proof is to recall that any measurable function can be approximated using simple functions. Recall that we say a function is simple if
s n = ∑ i = 1 n c j χ I j ( x ) s_n = \sum_{i = 1}^{n} c_j \chi_{I_j}(x) s n = ∑ i = 1 n c j χ I j ( x )
where χ I j \chi_{I_j} χ I j is the indicator function of a set, recall that from the construction of the Lebesgue integral these sets can be complicated (i.e. contain several disjointed pieces). Then given any measurable function f ( x ) f(x) f ( x ) we can find a sequence of increasing simple functions (i.e. s 1 ( x ) ≤ s 2 ( x ) ≤ ⋯ ≤ f ( x ) s_1(x) \leq s_2(x) \leq \dots \leq f(x) s 1 ( x ) ≤ s 2 ( x ) ≤ ⋯ ≤ f ( x ) ) such that s n ( x ) → f ( x ) s_n(x) \to f(x) s n ( x ) → f ( x ) pointwise for almost every x x x i.e. ∣ s n − f ∣ < ε |s_n - f| < \varepsilon ∣ s n − f ∣ < ε .
We have that ∣ s n − f ∣ p ≤ 2 ∣ f ∣ p ∈ L 1 |s_n - f|^p \leq 2|f|^p \in L^1 ∣ s n − f ∣ p ≤ 2∣ f ∣ p ∈ L 1 and the Lebesgue dominated convergence theorem implies that
lim n → ∞ ∫ ∣ s n − f ∣ p d μ = ∫ lim n → ∞ ∣ s n − f ∣ p d μ ≤ C ε \lim_{n\to\infty} \int |s_n - f|^p d\mu = \int \lim_{n\to\infty} |s_n - f|^p d\mu \leq C \varepsilon lim n → ∞ ∫ ∣ s n − f ∣ p d μ = ∫ lim n → ∞ ∣ s n − f ∣ p d μ ≤ Cε
Thus we have that ∥ s n − f ∥ p → 0 \|s_n - f\|_p \to 0 ∥ s n − f ∥ p → 0 .
Finally note that each simple function is continuous and we can pick the weights to be rational, and for the final step mollify each of the characteristic functions.
Combining with the mollifiers theorem, we obtain the density of smooth functions.
Approximation Beyond Polynomials ¶ The classical approximation results above — Weierstrass, mollification, density of C ∞ \mathcal{C}^\infty C ∞ in L p L^p L p — all share a common structure: they identify a “nice” class of functions (polynomials, smooth functions) that can approximate arbitrary elements of a larger space.
A natural question is: what other classes of functions are dense in common function spaces? It turns out that neural networks provide a modern and remarkably powerful answer. The Universal Approximation Theorem (Cybenko, 1989; Hornik, 1991) shows that single-hidden-layer neural networks with sufficiently many neurons are dense in C 0 ( K ) \mathcal{C}^0(K) C 0 ( K ) for any compact K K K , and consequently in L p L^p L p .
From the functional-analytic viewpoint, this is a density result: the set of functions representable by a neural network architecture is dense in the spaces we care about. The proof, which we will see later in the course, is a beautiful application of the Hahn-Banach theorem — one of the central results in duality theory. We will develop this connection in detail in the chapter on Neural Network Connections.