Density and Approximation

A central theme in functional analysis is approximation: given a function in some large space, can we approximate it arbitrarily well by functions from a nicer, more structured class? This question is formalized through the notions of density and separability.

Dense Subsets¶

Definition 1 (Dense Subset)

A subset $Y\subset X$ is dense in $X$ if $\bar{Y} = X$ . Equivalently, given $x\in X$ , $\forall \varepsilon > 0$ $\exists y \in Y$ such that $\|x - y\| < \varepsilon$ .

Example 1 (Rationals in Reals)

$\bar{\mathbb{Q}} = \mathbb{R}$ . Let $r \in \mathbb{R}$ and $\varepsilon > 0$ then there is $n$ such that $\frac{1}{n} < \varepsilon$ . Define $q_n = \frac{\lfloor nr\rfloor}{n}$ which implies that $\|r - \frac{nr}{n}\| < \frac{1}{n} < \varepsilon$

In other words we can approximate any real number $r$ arbitrarily well by elements in the rationals e.g. the rationals are dense in the reals. This property of the rationals is crucially important for finite arithmetic approximations of the reals on computers.

Example 2 (Weierstrass Theorem)

Let $f \in \mathcal{C}^0([-1, 1])$ and let $\varepsilon > 0$ then there exists a polynomial $p$ such that $\|f - p\|_{\infty} < \varepsilon$ .

Separability¶

Definition 2 (Separable Space)

A Banach space which contains a dense countable subset is called separable.

Example 3 (Separability of Real Numbers)

Since the rationals are countable and dense in $\mathbb{R}$ , thus $\mathbb{R}$ is separable.

Example 4 (Separability of Continuous Functions)

$\mathcal{C}^0([-1, 1])$ has the dense subset $\{ ax^n : a \in \mathbb{Q},\ n \in \mathbb{N} \cup \{ 0 \} \}$

This set is countable and dense in $\mathcal{C}^0([-1, 1])$ : we can approximate any continuous function using a polynomial having rational coefficients. Thus $\mathcal{C}^0([-1, 1])$ is a separable Banach space.

Example 5 (Separability of Square Integrable Functions)

The space of square integrable functions $L^2([-1, 1])$ has basis set $\{ 1,\ \cos(nx),\ \sin(nx)\}$

which is countable and hence it is a separable space.

Example 6 (Non-separability of L-infinity)

$L^\infty([-1, 1])$ is not separable (see homework).

Mollification¶

The key technique for approximating rough functions by smooth ones is mollification — convolution with a smooth bump function that averages a function over a small neighborhood.

Theorem 1 (Mollifiers Theorem)

Given $f \in \mathcal{C}^0_c(\Omega)$ , for each $\varepsilon > 0$ $\exists \phi \in \mathcal{C}^\infty_c(\Omega)$ such that $\|f - \phi\|_{\infty} < \varepsilon$

This is a powerful result: it tells us that any continuous function with compact support can be uniformly approximated by smooth, compactly supported functions. The idea is to convolve $f$ with a smooth bump (a mollifier) at a sufficiently small scale.

Density of Smooth Functions in $L^p$ ¶

Mollification is the engine behind the fundamental density results for $L^p$ spaces.

Theorem 2 (Density of $\mathcal{C}^0$ in $L^p$ )

Let $\Omega$ be bounded, then $\mathcal{C}^0(\Omega)$ is dense in $L^p(\Omega)$ i.e.

L^p(\Omega) = \overline{\mathcal{C}^0_c(\Omega)}

(1)

where the closure is with respect to the $p$ -norm, and $L^p$ is separable.

Combining with the mollifiers theorem, we obtain the density of smooth functions.

Corollary 1 (Density of $\mathcal{C}^\infty_c$ in $L^p$ )

Let $\Omega$ be bounded. Then $\mathcal{C}^\infty_c(\Omega)$ is dense in $L^p(\Omega)$ :

L^p(\Omega) = \overline{\mathcal{C}^\infty_c(\Omega)}

(2)

Approximation Beyond Polynomials¶

The classical approximation results above — Weierstrass, mollification, density of $\mathcal{C}^\infty$ in $L^p$ — all share a common structure: they identify a “nice” class of functions (polynomials, smooth functions) that can approximate arbitrary elements of a larger space.

Remark 1 (Toward Neural Network Approximation)

A natural question is: what other classes of functions are dense in common function spaces? It turns out that neural networks provide a modern and remarkably powerful answer. The Universal Approximation Theorem (Cybenko, 1989; Hornik, 1991) shows that single-hidden-layer neural networks with sufficiently many neurons are dense in $\mathcal{C}^0(K)$ for any compact $K$ , and consequently in $L^p$ .

From the functional-analytic viewpoint, this is a density result: the set of functions representable by a neural network architecture is dense in the spaces we care about. The proof, which we will see later in the course, is a beautiful application of the Hahn-Banach theorem — one of the central results in duality theory. We will develop this connection in detail in the chapter on Neural Network Connections.