Numerical Differentiation University of Massachusetts Amherst
Derivatives can be approximated using function values at discrete points. Taylor’s theorem tells us exactly how accurate these approximations are and reveals the fundamental trade-off between truncation error and round-off error.
Suppose we have a function f ( x ) f(x) f ( x ) and we want to compute its derivative at a point x 0 x_0 x 0 (i.e., the slope of the tangent line to the curve y = f ( x ) y = f(x) y = f ( x ) at x 0 x_0 x 0 ).
Recall the limit definition of the derivative:
f ′ ( x 0 ) = lim h → 0 f ( x 0 + h ) − f ( x 0 ) h f'(x_0) = \lim_{h\to 0} \frac{f(x_0 + h) - f(x_0)}{h} f ′ ( x 0 ) = h → 0 lim h f ( x 0 + h ) − f ( x 0 ) The left-hand side is the slope of the tangent line, while the right-hand side is the limit of the slope of the secant line connecting the points ( x 0 , f ( x 0 ) ) (x_0, f(x_0)) ( x 0 , f ( x 0 )) and ( x 0 + h , f ( x 0 + h ) ) (x_0 + h, f(x_0 + h)) ( x 0 + h , f ( x 0 + h )) .
This suggests we can approximate the derivative using:
f ′ ( x 0 ) ≈ f ( x 0 + h ) − f ( x 0 ) h f'(x_0) \approx \frac{f(x_0 + h) - f(x_0)}{h} f ′ ( x 0 ) ≈ h f ( x 0 + h ) − f ( x 0 ) for h h h “small”. A major question is: how accurate is this approximation?
Error Analysis ¶ The forward difference approximation satisfies:
f ( x 0 + h ) − f ( x 0 ) h = f ′ ( x 0 ) + E ( h ) \frac{f(x_0 + h) - f(x_0)}{h} = f'(x_0) + E(h) h f ( x 0 + h ) − f ( x 0 ) = f ′ ( x 0 ) + E ( h ) where the error is:
E ( h ) = − h 2 f ′ ′ ( ξ ) = O ( h ) E(h) = -\frac{h}{2} f''(\xi) = \mathcal{O}(h) E ( h ) = − 2 h f ′′ ( ξ ) = O ( h ) for some ξ ∈ ( x 0 , x 0 + h ) \xi \in (x_0, x_0 + h) ξ ∈ ( x 0 , x 0 + h ) . The error is first-order in h h h .
Using Taylor’s theorem with k = 1 k = 1 k = 1 :
f ( x 0 + h ) = f ( x 0 ) + f ′ ( x 0 ) h + f ′ ′ ( ξ ) 2 h 2 f(x_0 + h) = f(x_0) + f'(x_0) h + \frac{f''(\xi)}{2} h^2 f ( x 0 + h ) = f ( x 0 ) + f ′ ( x 0 ) h + 2 f ′′ ( ξ ) h 2 where ξ ∈ ( x 0 , x 0 + h ) \xi \in (x_0, x_0 + h) ξ ∈ ( x 0 , x 0 + h ) .
Substituting into our secant line approximation:
f ( x 0 + h ) − f ( x 0 ) h = 1 h ( f ( x 0 ) + h f ′ ( x 0 ) + 1 2 h 2 f ′ ′ ( ξ ) − f ( x 0 ) ) = f ′ ( x 0 ) + 1 2 h f ′ ′ ( ξ ) \begin{split}
\frac{f(x_0 + h) - f(x_0)}{h} &= \frac{1}{h}\left( f(x_0) + h f'(x_0) + \frac{1}{2} h^2 f''(\xi) - f(x_0) \right) \\
&= f'(x_0) + \frac{1}{2} h f''(\xi)
\end{split} h f ( x 0 + h ) − f ( x 0 ) = h 1 ( f ( x 0 ) + h f ′ ( x 0 ) + 2 1 h 2 f ′′ ( ξ ) − f ( x 0 ) ) = f ′ ( x 0 ) + 2 1 h f ′′ ( ξ ) Remark 1 (Computational Verification)To verify the error theory: if E ( h ) = C h E(h) = Ch E ( h ) = C h , then log E = log C + log h \log E = \log C + \log h log E = log C + log h .
Plot log E \log E log E vs. log h \log h log h — you should see a straight line with slope 1 .
The absolute error of the forward difference approximation of the derivative for sin ( x 0 ) \sin(x_0) sin ( x 0 ) at x 0 = 1.2 x_0 = 1.2 x 0 = 1.2 .
The Round-off Error Trade-off ¶ The plot above shows something surprising: the error increases for very small h h h !
This is our first encounter with a fundamental phenomenon in scientific computing: round-off error . When h h h becomes very small, we’re subtracting two nearly equal numbers (f ( x 0 + h ) ≈ f ( x 0 ) f(x_0 + h) \approx f(x_0) f ( x 0 + h ) ≈ f ( x 0 ) ), and then dividing by a tiny h h h . This amplifies the small errors inherent in floating-point arithmetic.
In the next chapter on floating-point arithmetic , we’ll learn:
Why computers can’t represent most real numbers exactly
What machine epsilon is and why it’s approximately 10-16 for double precision
How to find the optimal step size that balances truncation and round-off error
For now, the practical takeaway is: smaller h h h is not always better . For first-order finite differences, h ≈ 1 0 − 8 h \approx 10^{-8} h ≈ 1 0 − 8 is often optimal.
The forward difference only uses information to the right of x 0 x_0 x 0 . From geometric intuition, using points on both sides should give a better approximation to the tangent line—the secant line through ( x 0 − h , f ( x 0 − h ) ) (x_0 - h, f(x_0 - h)) ( x 0 − h , f ( x 0 − h )) and ( x 0 + h , f ( x 0 + h ) ) (x_0 + h, f(x_0 + h)) ( x 0 + h , f ( x 0 + h )) is more symmetric.
The central difference approximation:
f ′ ( x 0 ) ≈ f ( x 0 + h ) − f ( x 0 − h ) 2 h f'(x_0) \approx \frac{f(x_0 + h) - f(x_0 - h)}{2h} f ′ ( x 0 ) ≈ 2 h f ( x 0 + h ) − f ( x 0 − h ) has error E ( h ) = O ( h 2 ) E(h) = \mathcal{O}(h^2) E ( h ) = O ( h 2 ) — second-order in h h h . Squaring h h h squares the error!
We expand both f ( x 0 + h ) f(x_0 + h) f ( x 0 + h ) and f ( x 0 − h ) f(x_0 - h) f ( x 0 − h ) using Taylor series:
f ( x 0 ± h ) = f ( x 0 ) ± h f ′ ( x 0 ) + h 2 2 f ′ ′ ( x 0 ) ± h 3 6 f ′ ′ ′ ( ξ ± ) + O ( h 4 ) f(x_0 \pm h) = f(x_0) \pm h f'(x_0) + \frac{h^2}{2}f''(x_0) \pm \frac{h^3}{6} f'''(\xi^{\pm}) + O(h^4) f ( x 0 ± h ) = f ( x 0 ) ± h f ′ ( x 0 ) + 2 h 2 f ′′ ( x 0 ) ± 6 h 3 f ′′′ ( ξ ± ) + O ( h 4 ) Substituting into the central difference formula:
f ( x 0 + h ) − f ( x 0 − h ) 2 h = 1 2 h ( 2 h f ′ ( x 0 ) + h 3 6 ( f ′ ′ ′ ( ξ + ) + f ′ ′ ′ ( ξ − ) ) + O ( h 5 ) ) = f ′ ( x 0 ) + h 2 12 ( f ′ ′ ′ ( ξ − ) + f ′ ′ ′ ( ξ + ) ) + O ( h 4 ) \begin{split}
\frac{f(x_0 + h) - f(x_0 - h)}{2h}
&= \frac{1}{2h}\left( 2hf'(x_0) + \frac{h^3}{6}\left( f'''(\xi^+) + f'''(\xi^-) \right) + O(h^5)\right) \\
&= f'(x_0) + \frac{h^2}{12} \left( f'''(\xi^-) + f'''(\xi^+) \right) + O(h^4)
\end{split} 2 h f ( x 0 + h ) − f ( x 0 − h ) = 2 h 1 ( 2 h f ′ ( x 0 ) + 6 h 3 ( f ′′′ ( ξ + ) + f ′′′ ( ξ − ) ) + O ( h 5 ) ) = f ′ ( x 0 ) + 12 h 2 ( f ′′′ ( ξ − ) + f ′′′ ( ξ + ) ) + O ( h 4 ) Why the improvement? The even-powered terms (h 2 , h 4 , … h^2, h^4, \ldots h 2 , h 4 , … ) cancel due to symmetry.
Central finite difference error for approximating tan ′ ( 0.2 ) \tan'(0.2) tan ′ ( 0.2 ) . The error follows O ( h 2 ) O(h^2) O ( h 2 ) (red line) until roundoff error dominates at h ≈ 1 0 − 6 h \approx 10^{-6} h ≈ 1 0 − 6 .
Comparison: Forward vs Central ¶ Method Formula Error Order Forward f ( x 0 + h ) − f ( x 0 ) h \frac{f(x_0 + h) - f(x_0)}{h} h f ( x 0 + h ) − f ( x 0 ) O ( h ) O(h) O ( h ) 1st Central f ( x 0 + h ) − f ( x 0 − h ) 2 h \frac{f(x_0 + h) - f(x_0 - h)}{2h} 2 h f ( x 0 + h ) − f ( x 0 − h ) O ( h 2 ) O(h^2) O ( h 2 ) 2nd
Example 1 (Accuracy Comparison)For h = 0.01 h = 0.01 h = 0.01 :
The central difference is 100× more accurate for the same step size!
Remark 2 (Optimal Step Size for Central Differences)The total error (truncation + roundoff) is:
E ( h ) = M 0 μ h + M 1 h 2 E(h) = \frac{M_0 \mu}{h} + M_1 h^2 E ( h ) = h M 0 μ + M 1 h 2 Minimizing gives h opt ≈ μ 1 / 3 ≈ 6 × 1 0 − 6 h_{\text{opt}} \approx \mu^{1/3} \approx 6 \times 10^{-6} h opt ≈ μ 1/3 ≈ 6 × 1 0 − 6 for double precision.
This is larger than for forward differences because truncation error decreases faster (h 2 h^2 h 2 vs h h h ).
Second Derivative Approximation ¶ The central difference approximation for the second derivative:
f ′ ′ ( x 0 ) ≈ f ( x 0 + h ) − 2 f ( x 0 ) + f ( x 0 − h ) h 2 f''(x_0) \approx \frac{f(x_0 + h) - 2f(x_0) + f(x_0 - h)}{h^2} f ′′ ( x 0 ) ≈ h 2 f ( x 0 + h ) − 2 f ( x 0 ) + f ( x 0 − h ) has error O ( h 2 ) O(h^2) O ( h 2 ) .
Adding the Taylor expansions (instead of subtracting):
f ( x 0 + h ) + f ( x 0 − h ) = 2 f ( x 0 ) + h 2 f ′ ′ ( x 0 ) + O ( h 4 ) f(x_0 + h) + f(x_0 - h) = 2f(x_0) + h^2 f''(x_0) + O(h^4) f ( x 0 + h ) + f ( x 0 − h ) = 2 f ( x 0 ) + h 2 f ′′ ( x 0 ) + O ( h 4 ) Solving for f ′ ′ ( x 0 ) f''(x_0) f ′′ ( x 0 ) gives the formula above.
This formula is fundamental in numerical PDE s—it’s the discrete Laplacian in one dimension.
Summary ¶ Method Formula Error Order Forward difference f ( x + h ) − f ( x ) h \frac{f(x+h) - f(x)}{h} h f ( x + h ) − f ( x ) O ( h ) O(h) O ( h ) Central difference f ( x + h ) − f ( x − h ) 2 h \frac{f(x+h) - f(x-h)}{2h} 2 h f ( x + h ) − f ( x − h ) O ( h 2 ) O(h^2) O ( h 2 ) Second derivative f ( x + h ) − 2 f ( x ) + f ( x − h ) h 2 \frac{f(x+h) - 2f(x) + f(x-h)}{h^2} h 2 f ( x + h ) − 2 f ( x ) + f ( x − h ) O ( h 2 ) O(h^2) O ( h 2 )
Key principle: Using symmetric stencils improves accuracy, but there are diminishing returns due to round-off error. The optimal step size balances truncation and round-off errors.