Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

The Curse of Dimensionality

Classical approximation theory suffers from the curse of dimensionality:

To approximate a generic CkC^k function on [0,1]d[0,1]^d to accuracy ϵ\epsilon:

In high dimensions (d=100,1000,...d = 100, 1000, ...), this is catastrophic!

The Barron Norm

Barron identified a function class where neural networks avoid this curse.

Intuition: The Barron norm measures the first moment of the Fourier transform. It’s finite when the Fourier transform decays fast enough.

Examples

FunctionBarron norm
Smooth with compact Fourier supportFinite
Gaussian ex2/2e^{-|x|^2/2}Finite
Ridge function σ(wx)\sigma(w \cdot x)Finite if σ\sigma smooth
Generic CkC^k functionMay be infinite
Discontinuous functionInfinite

Barron’s Theorem

The miracle: The bound depends on nn and CfC_f, but not explicitly on the dimension dd!

The dimension enters only through CfC_f, which can be dimension-independent for many functions of practical interest.

Why This Matters

Polynomial vs. Neural Network

Consider approximating a function on [0,1]d[0,1]^d:

MethodTerms/neurons for ϵ\epsilon error
Polynomials (generic CkC^k)O(ϵd/k)O(\epsilon^{-d/k})
Neural network (Barron)O(Cf2/ϵ2)O(C_f^2 / \epsilon^2)

For a function with Cf=O(1)C_f = O(1):

This explains why neural networks succeed in high-dimensional problems!

Proof Idea

The proof is probabilistic:

  1. Represent ff as an integral:

    f(x)=a(ω)σ(ωx+b(ω))dμ(ω)f(x) = \int a(\omega) \sigma(\omega \cdot x + b(\omega)) \, d\mu(\omega)

    for some measure μ\mu related to the Fourier transform.

  2. Random sampling: Draw nn samples ω1,,ωn\omega_1, \ldots, \omega_n from μ\mu.

  3. Monte Carlo estimate:

    fn(x)=1nk=1na(ωk)σ(ωkx+b(ωk))f_n(x) = \frac{1}{n} \sum_{k=1}^n a(\omega_k) \sigma(\omega_k \cdot x + b(\omega_k))
  4. Law of large numbers: The approximation error is O(1/n)O(1/\sqrt{n}) by standard concentration arguments.

This doesn’t tell us how to find the weights—but it proves they exist!

Implications for Deep Learning

Why Training Works (Sometimes)

Gradient descent finds networks that approximate target functions. Barron’s theorem suggests this is possible without exponential complexity—if the target is in the Barron class.

What’s NOT in Barron Space?

Connection to Classical Approximation

There’s a beautiful analogy:

ClassicalNeural Networks
Continuous → Weierstrass theoremContinuous → Universal approximation
Bounded variation → Chebyshev boundBarron norm → Barron bound
CkC^kO(nk)O(n^{-k}) convergenceSpectral Barron → O(nk/2)O(n^{-k/2})

Both theories have:

  1. Qualitative density results (existence)

  2. Quantitative rates depending on function smoothness

  3. Optimal vs. achievable distinctions

Limitations

Summary

AspectResult
Function classBarron space: $C_f = \int
Approximation rateO(Cf/n)O(C_f / \sqrt{n}) with nn neurons
Dimension dependenceIn CfC_f, not in rate
Key insightEscapes curse of dimensionality for Barron functions
Practical implicationExplains why NNs work in high-dd

What About Deep Networks?

Barron’s theorem applies to shallow (single hidden layer) networks. For deep networks, the theory extends to:

See Deep Networks and Neural ODEs for the full treatment.

Further Reading