Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Motivation

Computing unit normal vectors is essential for:

Given a vector v=(v1,v2,v3)\mathbf{v} = (v_1, v_2, v_3), its unit vector is:

v^=vv12+v22+v32\hat{\mathbf{v}} = \frac{\mathbf{v}}{\sqrt{v_1^2 + v_2^2 + v_3^2}}

This requires evaluating the inverse square root: f(x)=1xf(x) = \frac{1}{\sqrt{x}}

In the 1990s when computational power was limited, a fast implementation was crucial for real-time graphics.

The Famous Code

From the Quake III Arena source code:

float Q_rsqrt( float x )
{
    long i;
    float x2, y;
    const float threehalfs = 1.5F;

    x2 = x * 0.5F;
    y  = x;
    i  = * ( long * ) &y;               // evil floating point bit level hacking
    i  = 0x5f3759df - ( i >> 1 );       // what the fuck?
    y  = * ( float * ) &i;
    y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
    return y;
}

Let’s understand what this does. But first, we need to know how those 32 bits actually encode a number.

How Computers Store Numbers: IEEE 754

Up to this point, we’ve treated floating-point numbers as approximations with a relative error guarantee. That was enough to explain the finite difference trade-off. But the fast inverse square root exploits the bit-level details of how numbers are stored — so now we need to look inside.

Binary Notation

Just as decimal uses powers of 10, binary uses powers of 2 with digits di{0,1}d_i \in \{0, 1\}:

dNdN1d1d0    i=0Ndi2id_N d_{N-1} \dots d_1 d_0 \;\longrightarrow\; \sum_{i=0}^{N} d_i \cdot 2^i

For example, 1102=14+12+01=6110_2 = 1 \cdot 4 + 1 \cdot 2 + 0 \cdot 1 = 6. Fractions work the same way: 1.012=1+021+122=1.251.01_2 = 1 + 0 \cdot 2^{-1} + 1 \cdot 2^{-2} = 1.25.

Scientific Notation in Binary

Just like 245000=2.45×105245000 = 2.45 \times 10^5 in decimal, any nonzero binary number can be written in normalized form:

x=±1.d1d2d3×2ex = \pm 1.d_1 d_2 d_3 \dots \times 2^e

The leading digit is always 1, so it doesn’t need to be stored — a free extra bit of precision!

IEEE 754 Single Precision (32-bit)

The IEEE 754 standard packs a floating-point number into 32 bits with three fields:

ComponentBitsRole
Sign SS10 = positive, 1 = negative
Exponent EE8Biased exponent (stored as E=e+127E = e + 127)
Mantissa MM23Fractional part m=M/223m = M / 2^{23}

The value represented is:

x=(1)S×(1+m)×2E127x = (-1)^S \times (1 + m) \times 2^{E - 127}

where m=M/223[0,1)m = M/2^{23} \in [0, 1) is the fractional mantissa. The “1+1 +” comes from the implicit leading bit.

The Floating-Point Number Line

Every floating-point number is a rational number of the form m2em \cdot 2^e, so the set of representable numbers FQ\mathbb{F} \subset \mathbb{Q} is finite. But unlike a uniform grid, floating-point numbers are not evenly spaced. Within each exponent range [2e,2e+1)[2^e, 2^{e+1}), the 2t2^t mantissa values divide the interval into equally spaced points, with gap size:

Δ=2et\Delta = 2^{e - t}

where tt is the number of mantissa bits. When the exponent increases by 1, the gap doubles. The result is a logarithmic distribution: dense near zero, sparse for large magnitudes.

The figure below illustrates this for a toy system with 3-bit mantissa and exponents e{0,1,2,3}e \in \{0, 1, 2, 3\}. Notice how the spacing doubles at each power of 2.

Source
import numpy as np
import matplotlib.pyplot as plt

# Toy floating-point system: 3-bit mantissa, exponents 0..3
t = 3  # mantissa bits
floats = []
for e in range(4):  # exponents 0, 1, 2, 3
    for m in range(2**t):  # mantissa values 0, 1, ..., 7
        x = (1 + m / 2**t) * 2**e
        floats.append(x)

floats = sorted(set(floats))

fig, ax = plt.subplots(figsize=(10, 1.5))
ax.scatter(floats, np.zeros_like(floats), marker='|', s=400, linewidths=2.5, color='C0')

# Mark powers of 2 to show exponent boundaries
for e in range(5):
    ax.axvline(2**e, color='gray', linestyle='--', linewidth=0.7, alpha=0.5)
    ax.text(2**e, 0.25, f'$2^{e}$', ha='center', fontsize=10, color='gray')

ax.set_xlim(0.8, 17)
ax.set_ylim(-0.3, 0.45)
ax.set_xlabel('Value')
ax.set_title('Floating-point numbers (3-bit mantissa): spacing doubles at each power of 2')
ax.set_yticks([])
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
plt.tight_layout()
plt.show()
Remark 3 (Floating Point as a Finite Approximation to R\mathbb{R})

In analysis, the real numbers R\mathbb{R} are constructed as the completion of the rationals Q\mathbb{Q} — every gap between rationals is filled by taking limits of Cauchy sequences. The reals are the smallest set where every convergent sequence has a limit.

Floating-point arithmetic attempts something analogous with finite resources: approximate R\mathbb{R} using a finite subset FQ\mathbb{F} \subset \mathbb{Q}, chosen so that every real number has a nearby representative. The logarithmic spacing ensures this works across many orders of magnitude — but unlike R\mathbb{R}, the gaps never close. Machine epsilon is the price we pay for finiteness: no matter how many bits we use, there is always a smallest relative gap below which distinct real numbers become indistinguishable.

The Strategy: Turn Multiplication into Addition

We want to compute y=x1/2y = x^{-1/2}. Taking log2\log_2 of both sides:

log2(y)=12log2(x)\log_2(y) = -\tfrac{1}{2}\log_2(x)

If we had a cheap way to compute log2\log_2, we could replace the expensive power x1/2x^{-1/2} with a simple multiply by (1/2)(-1/2), then exponentiate back. Of course, computing log2\log_2 is itself expensive — unless the hardware is already doing it for us.

Integer Interpretation of Float Bits

Recall that a 32-bit float stores sign bit SxS_x, exponent ExE_x, and mantissa bits mxm_x. If we read those same 32 bits as an integer, we get:

Int(x)=231Sx+223Ex+Mx\text{Int}(x) = 2^{31} S_x + 2^{23} E_x + M_x

where Mx=223mxM_x = 2^{23} m_x is the integer value of the mantissa bits. This is what the C line i = *(long *)&y does — it reinterprets the float’s bits as an integer without changing them.

Why Integer Bits \approx Logarithm

From the IEEE 754 representation, a positive float represents:

x=2Ex127(1+mx)x = 2^{E_x - 127}(1 + m_x)

Taking log2\log_2:

log2(x)=(Ex127)+log2(1+mx)\log_2(x) = (E_x - 127) + \log_2(1 + m_x)

Since mx[0,1)m_x \in [0, 1), we approximate log2(1+mx)mx+σ\log_2(1 + m_x) \approx m_x + \sigma where σ0.0430\sigma \approx 0.0430. The first-order Taylor expansion gives log2(1+mx)mx/ln21.4427mx\log_2(1+m_x) \approx m_x / \ln 2 \approx 1.4427\, m_x, but the algorithm approximates this with slope 1 instead. The constant σ\sigma partially compensates for the error introduced by dropping the 1/ln21/\ln 2 factor, and is chosen to minimize the approximation error over the entire interval [0,1)[0,1). Substituting mx=Mx/223m_x = M_x / 2^{23}:

log2(x)1223(Mx+223Ex)Int(x)+(σ127)\log_2(x) \approx \frac{1}{2^{23}}\underbrace{(M_x + 2^{23} E_x)}_{\text{Int}(x)} + (\sigma - 127)

That is:

log2(x)1223Int(x)+(σ127)\log_2(x) \approx \frac{1}{2^{23}} \cdot \text{Int}(x) + (\sigma - 127)

The integer interpretation of float bits is a scaled-and-shifted logarithm. This is the key insight that makes the entire algorithm work.

The Magic Formula

Now apply this to log2(y)=12log2(x)\log_2(y) = -\frac{1}{2}\log_2(x). Writing both sides in terms of integer bit patterns:

1223Int(y)+(σ127)=12[1223Int(x)+(σ127)]\frac{1}{2^{23}} \text{Int}(y) + (\sigma - 127) = -\frac{1}{2}\left[\frac{1}{2^{23}} \text{Int}(x) + (\sigma - 127)\right]

Solving for Int(y)\text{Int}(y):

Int(y)=32223(127σ)0x5f3759df12Int(x)\text{Int}(y) = \underbrace{\tfrac{3}{2} \cdot 2^{23}(127 - \sigma)}_{\texttt{0x5f3759df}} - \frac{1}{2}\,\text{Int}(x)

This is exactly the “what the fuck?” line:

i = 0x5f3759df - (i >> 1);

The magic number 0x5f3759df is just 32223(127σ)\frac{3}{2} \cdot 2^{23}(127 - \sigma) evaluated for the optimal σ\sigma.

Newton’s Method Refinement

The bit manipulation gives a rough approximation. To improve accuracy, we apply Newton’s method to:

g(y)=1y2xg(y) = \frac{1}{y^2} - x

The root of gg is y=1xy = \frac{1}{\sqrt{x}}.

Newton’s iteration:

yn+1=yng(yn)g(yn)=yn(32x2yn2)y_{n+1} = y_n - \frac{g(y_n)}{g'(y_n)} = y_n\left(\frac{3}{2} - \frac{x}{2}y_n^2\right)

This is exactly line 12 of the code:

y = y * (threehalfs - (x2 * y * y));

Efficiency

The algorithm is remarkably accurate while using only:

No expensive division or square root operations are needed. The interactive notebook explores the algorithm in detail and plots the relative error across a wide range of inputs — the bit trick alone achieves ~3.4% relative error, and a single Newton iteration brings this below 0.2%.

Historical Context and Modern Relevance

This algorithm was essential for real-time 3D graphics in the 1990s. In that era, division and square root were roughly 10–40x slower than multiplication on consumer hardware, so avoiding them entirely was a significant win. The algorithm is attributed to the Quake III Arena source code (id Software, 1999), though similar tricks circulated earlier in the graphics community.

Modern CPUs have dedicated instructions for inverse square root (e.g., rsqrtss on x86), making this trick less necessary. But the underlying idea — that the integer interpretation of float bits approximates a logarithm — remains a beautiful example of how understanding number representation enables creative algorithm design.