Practice with neural network approximation theory.
Q13.1: Simple Neural Network¶
Consider a single-neuron network: where .
(a) For , plot on .
(b) For , plot . What happens as ?
(c) Explain how adjusting and shifts and scales the transition region.
Q13.2: Approximating a Step Function¶
Use a sum of sigmoid neurons to approximate the step function .
(a) Use a single neuron with large . How close can you get?
(b) Use the difference to create a “bump.” Plot for various .
(c) Add several such bumps to approximate a piecewise constant function.
Q13.3: ReLU Networks¶
For ReLU activation :
(a) Show that .
(b) How can you build a “tent function” using two ReLU neurons?
(c) Use 4 ReLU neurons to approximate a function that is zero outside and has a peak at .
Q13.4: Universal Approximation in Action¶
Using numpy and optimization (e.g., scipy.optimize.minimize), fit a neural network with sigmoid neurons to:
(a) on
(b) on
(c) Compare errors for neurons.
Q13.5: Dimension Dependence¶
Consider approximating on .
(a) For polynomial approximation of degree , how many terms are needed?
(b) A theorem says Gaussians have finite Barron norm. What does this predict about neural network approximation?
(c) Why can’t we numerically verify this for ?
Q13.6: Barron Norm Computation¶
For the 1D function :
(a) Compute the Fourier transform .
(b) Verify that .
(c) What does Barron’s theorem predict for the approximation rate?
Q13.7: Comparing Approximation Methods¶
For on :
(a) Approximate with degree- polynomial using Chebyshev nodes. Plot error vs. .
(b) Approximate with a neural network with neurons (use optimization). Plot error vs. .
(c) Which achieves 6-digit accuracy with fewer parameters?
Q13.8: Deep vs. Shallow¶
Consider for (so ).
(a) A single hidden layer network needs neurons to represent this exactly with ReLU. Why?
(b) A deep network with layers needs neurons. Explain by building it layer by layer: .
(c) This is an example where depth helps exponentially. Can you think of other such examples?
Q13.9: Non-Approximable Functions¶
Consider on .
(a) How many polynomial terms (Chebyshev) are needed to capture this oscillation?
(b) How does the Barron norm scale with frequency?
(c) Is this function “hard” for neural networks? Why or why not?
Q13.10: Neural Networks for PDEs¶
Consider the ODE with .
(a) Write a loss function that measures how well a neural network satisfies the ODE and boundary conditions.
(b) Implement this (known as a “physics-informed neural network” or PINN).
(c) Compare accuracy to spectral collocation. Which is more efficient for this smooth problem?
Self-Assessment Questions¶
Test your understanding with these conceptual questions:
Universal Approximation: What does the universal approximation theorem say? What does it NOT say?
Activation Functions: Why do we need nonlinear activation functions? What happens with ?
Curse of Dimensionality: For polynomial approximation in dimensions with degree , how many terms are needed?
Barron Norm: What is the Barron norm? What property of a function does it measure?
Dimension Independence: Why is Barron’s rate remarkable compared to polynomial rates?
Existence vs. Computation: Barron’s theorem proves good approximations exist. Why doesn’t this immediately solve machine learning?
Deep vs. Shallow: Give an example where depth helps. Does Barron’s theorem apply to deep networks?