Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Practice with neural network approximation theory.


Q13.1: Simple Neural Network

Consider a single-neuron network: f(x)=aσ(wx+b)f(x) = a \cdot \sigma(wx + b) where σ(t)=tanh(t)\sigma(t) = \tanh(t).


Q13.2: Approximating a Step Function

Use a sum of sigmoid neurons to approximate the step function f(x)=1x>0f(x) = \mathbf{1}_{x > 0}.


Q13.3: ReLU Networks

For ReLU activation σ(t)=max(0,t)\sigma(t) = \max(0, t):


Q13.4: Universal Approximation in Action

Using numpy and optimization (e.g., scipy.optimize.minimize), fit a neural network with n=5n = 5 sigmoid neurons to:


Q13.5: Dimension Dependence

Consider approximating f(x1,,xd)=exp(x2)f(x_1, \ldots, x_d) = \exp(-\|x\|^2) on [1,1]d[-1, 1]^d.


Q13.6: Barron Norm Computation

For the 1D function f(x)=ex2/2f(x) = e^{-x^2/2}:


Q13.7: Comparing Approximation Methods

For f(x)=11+25x2f(x) = \frac{1}{1 + 25x^2} on [1,1][-1, 1]:


Q13.8: Deep vs. Shallow

Consider f(x)=x2Lf(x) = x^{2^L} for L=4L = 4 (so f(x)=x16f(x) = x^{16}).


Q13.9: Non-Approximable Functions

Consider f(x)=sin(1000x)f(x) = \sin(1000x) on [0,1][0, 1].


Q13.10: Neural Networks for PDEs

Consider the ODE u=π2uu'' = -\pi^2 u with u(0)=0,u(1)=0u(0) = 0, u(1) = 0.


Self-Assessment Questions

Test your understanding with these conceptual questions:

  1. Universal Approximation: What does the universal approximation theorem say? What does it NOT say?

  2. Activation Functions: Why do we need nonlinear activation functions? What happens with σ(t)=t\sigma(t) = t?

  3. Curse of Dimensionality: For polynomial approximation in dd dimensions with degree nn, how many terms are needed?

  4. Barron Norm: What is the Barron norm? What property of a function does it measure?

  5. Dimension Independence: Why is Barron’s O(1/n)O(1/\sqrt{n}) rate remarkable compared to polynomial rates?

  6. Existence vs. Computation: Barron’s theorem proves good approximations exist. Why doesn’t this immediately solve machine learning?

  7. Deep vs. Shallow: Give an example where depth helps. Does Barron’s theorem apply to deep networks?