Neural Networks as Universal Approximators - Introduction to Scientific Computing

Overview¶

We’ve studied approximation with:

Polynomials: Good for 1D, but $O(n^d)$ terms needed in $d$ dimensions
Fourier series: Excellent for periodic problems, same curse of dimensionality
Splines: Practical for low dimensions, exponential growth in high $d$

Neural networks offer a different approach: they approximate functions using compositions of simple nonlinear functions, achieving dimension-independent convergence rates for certain function classes.

Why Neural Networks Now?¶

This chapter connects approximation theory to modern machine learning:

Classical	Neural Networks
Polynomials, Fourier	Compositions of sigmoids/ReLUs
Explicit coefficients	Learned parameters
Exponential in $d$	Can be dimension-independent
Linear in parameters	Nonlinear optimization

Historical Context¶

1989: Cybenko proves universal approximation for sigmoid networks
1991: Hornik generalizes to arbitrary bounded nonlinearities
1993: Barron proves dimension-independent rates for Barron class
2010s: Deep learning revolution—practical success sparks theoretical interest
2020s: Understanding why deep networks work remains active research

Preview: The Barron Miracle¶

Classical approximation theory says: to approximate a $d$ -dimensional function with accuracy $\epsilon$ using polynomials, you need $O(\epsilon^{-d})$ terms.

In $d = 100$ dimensions (modest for machine learning): $\epsilon^{-100}$ terms—completely impractical!

Barron’s theorem says: for functions with bounded Barron norm, a neural network with $n$ neurons achieves:

\|f - f_n\|_{L^2} \leq \frac{C_f}{\sqrt{n}}

(1)

The dimension $d$ appears only in $C_f$ , not in the convergence rate!

This is why neural networks can handle high-dimensional problems that defeat polynomial methods.

Learning Outcomes¶

After completing this chapter, you should be able to:

L8.1: State the universal approximation theorem.
L8.2: Explain why existence doesn’t imply efficient computation.
L8.3: Define the Barron norm and Barron space.
L8.4: State Barron’s approximation theorem.
L8.5: Explain how neural networks escape the curse of dimensionality.
L8.6: Compare neural networks to polynomial approximation.