Overview¶
We’ve studied approximation with:
Polynomials: Good for 1D, but terms needed in dimensions
Fourier series: Excellent for periodic problems, same curse of dimensionality
Splines: Practical for low dimensions, exponential growth in high
Neural networks offer a different approach: they approximate functions using compositions of simple nonlinear functions, achieving dimension-independent convergence rates for certain function classes.
Why Neural Networks Now?¶
This chapter connects approximation theory to modern machine learning:
| Classical | Neural Networks |
|---|---|
| Polynomials, Fourier | Compositions of sigmoids/ReLUs |
| Explicit coefficients | Learned parameters |
| Exponential in | Can be dimension-independent |
| Linear in parameters | Nonlinear optimization |
Historical Context¶
1989: Cybenko proves universal approximation for sigmoid networks
1991: Hornik generalizes to arbitrary bounded nonlinearities
1993: Barron proves dimension-independent rates for Barron class
2010s: Deep learning revolution—practical success sparks theoretical interest
2020s: Understanding why deep networks work remains active research
Preview: The Barron Miracle¶
Classical approximation theory says: to approximate a -dimensional function with accuracy using polynomials, you need terms.
In dimensions (modest for machine learning): terms—completely impractical!
Barron’s theorem says: for functions with bounded Barron norm, a neural network with neurons achieves:
The dimension appears only in , not in the convergence rate!
This is why neural networks can handle high-dimensional problems that defeat polynomial methods.
Learning Outcomes¶
After completing this chapter, you should be able to:
L8.1: State the universal approximation theorem.
L8.2: Explain why existence doesn’t imply efficient computation.
L8.3: Define the Barron norm and Barron space.
L8.4: State Barron’s approximation theorem.
L8.5: Explain how neural networks escape the curse of dimensionality.
L8.6: Compare neural networks to polynomial approximation.