Practice analyzing errors, stability, and conditioning.
Self-Assessment Questions¶
Test your understanding with these conceptual questions:
Machine Epsilon: What happens when you add a number smaller than to 1.0?
Conditioning: What makes evaluating near ill-conditioned? Can any algorithm fix this?
Stability: Why is computing directly unstable for large ? What’s a stable alternative?
Machine Epsilon: What is the approximate machine epsilon for single and double precision? What does it tell us about the accuracy of floating point computations?
Subtractive Cancellation: When subtracting two numbers and , under what conditions is the result reliable?
Trade-offs: In finite differences, why can’t we just use an extremely small to get arbitrary accuracy?
Floating-Point Arithmetic¶
Q2.1: Machine Epsilon¶
(a) Write a program to experimentally determine machine epsilon for single (32-bit) and double (64-bit) precision.
(b) Verify your answers against the theoretical values 2-24 and 2-53.
(c) Compute in double precision. What do you get? Why?
Q2.2: Floating-Point Surprises¶
(a) Compute
0.1 + 0.1 + 0.1 - 0.3in Python. Is the result exactly zero? Explain why or why not using machine epsilon.(b) Why might
sum([0.1] * 10)not equal1.0?(c) How could you reliably test whether two floating-point numbers are “equal”?
Q2.3: Summation Order¶
Consider summing for .
(a) Compute the sum from to (forward).
(b) Compute the sum from to 1 (backward).
(c) Which is more accurate? Why? (Hint: think about the relative sizes of terms being added.)
Q2.4: Kahan Summation¶
Kahan (1965) proposed a compensated summation algorithm that tracks roundoff error:
def kahan_sum(xs):
s = 0.0 # Running sum
c = 0.0 # Compensation for lost low-order bits
for x in xs:
y = x - c # Compensate for previous error
t = s + y # Add to sum
c = (t - s) - y # Recover the roundoff error
s = t
return s(a) Implement this algorithm.
(b) Test with the sum using 6-digit decimal arithmetic. Trace through by hand.
(c) What is the role of the variable
c? Why does it improve accuracy?(d) Compare Kahan summation to naive summation for the integral:
using the trapezoidal rule with points, in single precision (
np.float32).
Condition Numbers¶
Q2.8: Condition Number Computation¶
Compute the condition number for:
(a) at
(b) at and
(c) at and
(d) at for integer
Which evaluations are well-conditioned? Which are ill-conditioned?
For , we have .
At : . This is well-conditioned.
Q2.9: Subtractive Cancellation¶
The expression suffers from cancellation for large .
(a) Compute this directly for .
(b) Rationalize to get and compute again.
(c) What is the relative error of the direct method?
Q2.10: Trigonometric Reformulation¶
For small , the expression loses accuracy.
(a) Compute directly.
(b) Use the identity and compute again.
(c) Compare to the Taylor approximation .
Stability Analysis¶
Q2.11: Stable Reformulation¶
For each of the following expressions, identify the potential numerical issue and propose a stable reformulation:
(a) for large positive
(b) for small
(c) for small
(d) when
Problem: Subtractive cancellation when .
Stable form: Multiply by :
Computational Exercises¶
Q2.12: Trapezoidal Rule with Round-off¶
Consider computing the integral:
using the trapezoidal rule with . The approximation is:
where and .
(a) Implement this using naive summation with single precision (32-bit) floats. Plot the relative error vs. for , . Explain what you observe.
(b) Implement using Kahan summation. Plot the relative error vs. . Explain what you observe.
To force single precision in NumPy:
s = np.asarray(0.0, dtype=np.float32)Q2.13: Fast Inverse Square Root Accuracy¶
Refer to the fast inverse square root code.
(a) Compute the relative error of the fast inverse square root without Newton iteration (just the bit manipulation). Plot the relative error for inputs .
(b) Compute the relative error with Newton iterations. How many iterations would you use for a game engine? Support your answer with error plots for 1, 2, and 3 iterations.