The Curse of Dimensionality¶
Classical approximation theory suffers from the curse of dimensionality:
To approximate a generic function on to accuracy :
Polynomials: Need terms
Fourier: Need terms
Piecewise polynomials: Need pieces
In high dimensions (), this is catastrophic!
The Barron Norm¶
Barron identified a function class where neural networks avoid this curse.
Intuition: The Barron norm measures the first moment of the Fourier transform. It’s finite when the Fourier transform decays fast enough.
Examples¶
| Function | Barron norm |
|---|---|
| Smooth with compact Fourier support | Finite |
| Gaussian | Finite |
| Ridge function | Finite if smooth |
| Generic function | May be infinite |
| Discontinuous function | Infinite |
Barron’s Theorem¶
The miracle: The bound depends on and , but not explicitly on the dimension !
The dimension enters only through , which can be dimension-independent for many functions of practical interest.
Why This Matters¶
Polynomial vs. Neural Network¶
Consider approximating a function on :
| Method | Terms/neurons for error |
|---|---|
| Polynomials (generic ) | |
| Neural network (Barron) |
For a function with :
Polynomial in : needs terms
Neural network: needs neurons
This explains why neural networks succeed in high-dimensional problems!
Proof Idea¶
The proof is probabilistic:
Represent as an integral:
for some measure related to the Fourier transform.
Random sampling: Draw samples from .
Monte Carlo estimate:
Law of large numbers: The approximation error is by standard concentration arguments.
This doesn’t tell us how to find the weights—but it proves they exist!
Implications for Deep Learning¶
Why Training Works (Sometimes)¶
Gradient descent finds networks that approximate target functions. Barron’s theorem suggests this is possible without exponential complexity—if the target is in the Barron class.
What’s NOT in Barron Space?¶
Functions with discontinuities
Functions with high-frequency oscillations (relative to domain size)
Generic “worst-case” functions
Connection to Classical Approximation¶
There’s a beautiful analogy:
| Classical | Neural Networks |
|---|---|
| Continuous → Weierstrass theorem | Continuous → Universal approximation |
| Bounded variation → Chebyshev bound | Barron norm → Barron bound |
| → convergence | Spectral Barron → |
Both theories have:
Qualitative density results (existence)
Quantitative rates depending on function smoothness
Optimal vs. achievable distinctions
Limitations¶
Summary¶
| Aspect | Result |
|---|---|
| Function class | Barron space: $C_f = \int |
| Approximation rate | with neurons |
| Dimension dependence | In , not in rate |
| Key insight | Escapes curse of dimensionality for Barron functions |
| Practical implication | Explains why NNs work in high- |
What About Deep Networks?¶
Barron’s theorem applies to shallow (single hidden layer) networks. For deep networks, the theory extends to:
Deep Barron spaces: Compositions of Barron-class functions
Neural ODEs: Continuous-depth networks as dynamical systems
Flow-induced spaces: Functions reachable by flows with bounded Barron velocity
See Deep Networks and Neural ODEs for the full treatment.
Further Reading¶
Barron (1993): “Universal approximation bounds for superpositions of a sigmoidal function”
Bach (2017): “Breaking the Curse of Dimensionality with Convex Neural Networks”
Weinan E (2020): “The Mathematical Theory of Deep Learning” (survey)