Barron Spaces

The Curse of Dimensionality¶

Classical approximation theory suffers from the curse of dimensionality:

To approximate a generic $C^k$ function on $[0,1]^d$ to accuracy $\epsilon$ :

Polynomials: Need $O(\epsilon^{-d/k})$ terms
Fourier: Need $O(\epsilon^{-d/k})$ terms
Piecewise polynomials: Need $O(\epsilon^{-d/k})$ pieces

In high dimensions ( $d = 100, 1000, ...$ ), this is catastrophic!

The Barron Norm¶

Barron identified a function class where neural networks avoid this curse.

Intuition: The Barron norm measures the first moment of the Fourier transform. It’s finite when the Fourier transform decays fast enough.

Examples¶

Function	Barron norm
Smooth with compact Fourier support	Finite
Gaussian $e^{-\|x\|^2/2}$	Finite
Ridge function $\sigma(w \cdot x)$	Finite if $\sigma$ smooth
Generic $C^k$ function	May be infinite
Discontinuous function	Infinite

Barron’s Theorem¶

The miracle: The bound depends on $n$ and $C_f$ , but not explicitly on the dimension $d$ !

The dimension enters only through $C_f$ , which can be dimension-independent for many functions of practical interest.

Why This Matters¶

Polynomial vs. Neural Network¶

Consider approximating a function on $[0,1]^d$ :

Method	Terms/neurons for $\epsilon$ error
Polynomials (generic $C^k$ )	$O(\epsilon^{-d/k})$
Neural network (Barron)	$O(C_f^2 / \epsilon^2)$

For a function with $C_f = O(1)$ :

Polynomial in $d = 100$ : needs $\epsilon^{-100/k}$ terms
Neural network: needs $1/\epsilon^2$ neurons

This explains why neural networks succeed in high-dimensional problems!

Proof Idea¶

The proof is probabilistic:

Represent $f$ as an integral:
$f(x) = \int a(\omega) \sigma(\omega \cdot x + b(\omega)) \, d\mu(\omega)$
(4)
for some measure $\mu$ related to the Fourier transform.
Random sampling: Draw $n$ samples $\omega_1, \ldots, \omega_n$ from $\mu$ .
Monte Carlo estimate:
$f_n(x) = \frac{1}{n} \sum_{k=1}^n a(\omega_k) \sigma(\omega_k \cdot x + b(\omega_k))$
(5)
Law of large numbers: The approximation error is $O(1/\sqrt{n})$ by standard concentration arguments.

This doesn’t tell us how to find the weights—but it proves they exist!

Implications for Deep Learning¶

Why Training Works (Sometimes)¶

Gradient descent finds networks that approximate target functions. Barron’s theorem suggests this is possible without exponential complexity—if the target is in the Barron class.

What’s NOT in Barron Space?¶

Functions with discontinuities
Functions with high-frequency oscillations (relative to domain size)
Generic “worst-case” functions

Connection to Classical Approximation¶

There’s a beautiful analogy:

Classical	Neural Networks
Continuous → Weierstrass theorem	Continuous → Universal approximation
Bounded variation → Chebyshev bound	Barron norm → Barron bound
$C^k$ → $O(n^{-k})$ convergence	Spectral Barron → $O(n^{-k/2})$

Both theories have:

Qualitative density results (existence)
Quantitative rates depending on function smoothness
Optimal vs. achievable distinctions

Limitations¶

Summary¶

Aspect	Result
Function class	Barron space: $C_f = \int
Approximation rate	$O(C_f / \sqrt{n})$ with $n$ neurons
Dimension dependence	In $C_f$ , not in rate
Key insight	Escapes curse of dimensionality for Barron functions
Practical implication	Explains why NNs work in high- $d$

What About Deep Networks?¶

Barron’s theorem applies to shallow (single hidden layer) networks. For deep networks, the theory extends to:

Deep Barron spaces: Compositions of Barron-class functions
Neural ODEs: Continuous-depth networks as dynamical systems
Flow-induced spaces: Functions reachable by flows with bounded Barron velocity

See Deep Networks and Neural ODEs for the full treatment.