The Weak* Topology and Banach-Alaoglu

Big Idea

The dual space $X^*$ is itself a Banach space, so it has a weak topology. But that weak topology requires the bidual $X^{**}$ , which can be enormous. The weak* topology sidesteps the bidual entirely by testing only against elements of $X$ . This gives a coarser topology with fewer open sets, making compactness easier, and leads to the Banach-Alaoglu theorem.

We now want a weak-type topology on the dual space $X^*$ , for the same reason we wanted one on $X$ : to gain compactness. Since $X^*$ is a Banach space in its own right, the most natural first attempt is to apply the same construction as before. Weak convergence of a sequence $(f_n)$ in $X^*$ would then mean

\phi(f_n) \to \phi(f) \quad \text{for every } \phi \in (X^*)^* = X^{**}.

(1)

This requires testing against every element of the bidual $X^{**}$ .

Example 1 (The bidual can be much larger than $X$ )

Now recall the canonical embedding $J : X \hookrightarrow X^{**}$ , which identifies each $x \in X$ with the evaluation functional $J(x)(f) = f(x)$ . Every element of $X$ already lives inside $X^{**}$ . So instead of testing against all of $X^{**}$ , we could test against just $J(X)$ : ask only that

f_n(x) \to f(x) \quad \text{for every } x \in X.

(2)

This is weaker (we test against fewer functionals), but it has two advantages. First, it uses only elements of $X$ , which we already understand. Second, because the topology is coarser (fewer open sets, fewer open covers), compactness becomes easier.

The weak* topology¶

Definition 1 (Weak* topology)

The weak* topology on $X^*$ , denoted $\sigma(X^*, X)$ , is the coarsest topology making every evaluation map $\hat{x} : X^* \to \mathbb{R}$ , $\hat{x}(f) = f(x)$ , continuous. A basis of open neighborhoods of $f \in X^*$ is:

V(f; x_1, \ldots, x_n, \varepsilon) = \{ g \in X^* : |g(x_i) - f(x_i)| < \varepsilon \text{ for } i = 1, \ldots, n \}

(3)

where $x_1, \ldots, x_n \in X$ and $\varepsilon > 0$ .

Continuity of $\hat{x}$ requires preimages of open sets to be open. Every open set in $\mathbb{R}$ is a union of open intervals, and $\hat{x}^{-1}((a, b)) = \{g \in X^* : a < g(x) < b\}$ is a slab in $X^*$ . So any topology making all evaluation maps continuous must contain all such slabs. The coarsest such topology is the one generated by the slabs alone. A weak* slab constrains the values of $g$ on finitely many test objects.

Example 2 (The weak* topology on $\ell^1 = (c_0)^*$ )

Where weak* sits: the hierarchy of topologies¶

The weak* topology is one of three natural topologies on $X^*$ . The canonical embedding $J : X \hookrightarrow X^{**}$ mediates between them: each $x \in X$ defines an evaluation functional $J[x] \in X^{**}$ by $J[x](f) = f(x)$ .

The weak* topology on $X^*$ is $\sigma(X^*, X)$ : slabs come from $J[x]$ for $x \in X$ .
The weak topology on $X^*$ is $\sigma(X^*, X^{**})$ : slabs come from all $\phi \in X^{**}$ , including those not in $J(X)$ .
The norm topology on $X^*$ uses norm balls.

Since $J(X) \subseteq X^{**}$ , every weak* open set is also weakly open, and every weakly open set is norm-open. More test functionals means more slabs, which means more open sets:

\text{weak* topology} \subseteq \text{weak topology on } X^* \subseteq \text{norm topology on } X^*.

(5)

Each inclusion means “coarser than or equal to.” The first inclusion is strict when $X$ is not reflexive (there exist elements of $X^{**} \setminus J(X)$ that generate extra slabs). When $X$ is reflexive, $J(X) = X^{**}$ and the weak* and weak topologies on $X^*$ coincide.

Example 3 ( $c_0 \hookrightarrow \ell^1 \hookrightarrow \ell^\infty$ )

Remark 1 (The compactness payoff)

Weak* convergence¶

Definition 2 (Weak* convergence)

Let $X$ be a normed space and let $(f_n)_{n \geq 1}$ be a sequence in $X^*$ . We say $f_n \xrightarrow{w^*} f$ (weak* convergence) if

f_n(x) \to f(x) \quad \text{for every } x \in X.

(6)

Now the picture flips: each $f_n$ is an instrument, and the sequence is a sequence of instruments, not objects. Think of replacing your entire measurement apparatus, swapping one thermometer for another, one scale for another. Fix any object $x$ and read off $f_1(x), f_2(x), f_3(x), \ldots$ . Weak* convergence means these readings stabilize to $f(x)$ for every fixed object. Geometrically, each $f_n$ defines a different foliation (different isotherms), and these foliations rearrange from step to step, but at every fixed point the height reading converges.

Proposition 1 (Strong convergence in $X^$ implies weak convergence)

Let $X$ be a normed space. If $f_n \to f$ in the norm of $X^*$ , then $f_n \xrightarrow{w^*} f$ .

Proof 1

Proposition 2 (Weak convergence in $X^$ implies weak convergence)

Let $X$ be a normed space. If $f_n \rightharpoonup f$ weakly in $X^*$ , then $f_n \xrightarrow{w^*} f$ .

Proof 2

In summary:

\text{strong in } X^* \implies \text{weak in } X^* \implies \text{weak* in } X^*,

(9)

and neither arrow reverses in general. When $X$ is reflexive ( $J$ is surjective), the last two notions coincide.

Example 4 (Weak* convergent but not weakly convergent in $\ell^1$ )

Take $X = c_0$ , so $X^* = \ell^1$ and $X^{**} = \ell^\infty$ . The standard basis vectors $e_n \in \ell^1$ converge weak* to 0: for any $x = (x_k) \in c_0$ ,

e_n(x) = x_n \to 0

(10)

since $x \in c_0$ means $x_n \to 0$ . But $e_n$ does not converge weakly to 0 in $\ell^1$ . The element $\phi = (1, 1, 1, \ldots) \in \ell^\infty = (\ell^1)^*$ satisfies

\phi(e_n) = 1 \quad \text{for all } n,

(11)

so $\phi(e_n) \not\to 0$ . This $\phi$ is precisely the kind of “phantom” test functional in $X^{**} \setminus J(X)$ from Example 1: it does not correspond to any object in $c_0$ , but it detects the non-convergence.

Visualizing weak* convergence in $\mathbb{R}^2$ ¶

Consider $f_n(x,y) = (1 - 1/n)\,x + (1/n)\,y$ on $(\mathbb{R}^2, \|\cdot\|_\infty)$ . Each $f_n$ has kernel line $(1 - 1/n)x + (1/n)y = 0$ , which slowly rotates toward the $y$ -axis as $n \to \infty$ . For any fixed point $(a, b)$ :

f_n(a, b) = \left(1 - \frac{1}{n}\right)a + \frac{1}{n}b \to a = f(a, b)

(12)

where $f(x,y) = x$ . The foliations converge pointwise: at each location, the height readings stabilize, even though the kernel lines are visibly rotating from picture to picture.

As with weak convergence, the $\mathbb{R}^2$ picture is a visual scaffold: in finite dimensions weak* = weak = strong, so the foliations converge uniformly, not just pointwise. The genuine weak* phenomenon, where pointwise convergence of height readings does not imply uniform convergence, requires infinite dimensions. The top row below shows the geometry (rotating level sets), but the real content is in the bottom row: height readings at fixed test points stabilizing one by one.

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Polygon

fig, axes = plt.subplots(2, 3, figsize=(15, 10))

# Top row: the foliations f_n for n = 2, 4, ∞
square_verts = np.array([[-1, -1], [1, -1], [1, 1], [-1, 1]])

# Fixed test points
test_points = [
    (1.5, 0.8, r'$p_1$', 'C3'),
    (-0.5, 1.2, r'$p_2$', 'C2'),
    (0.8, -0.6, r'$p_3$', 'C4'),
]

n_values = [2, 4, None]  # None = limit f(x,y) = x
titles_top = [r'$f_2 = \frac{1}{2}x + \frac{1}{2}y$',
              r'$f_4 = \frac{3}{4}x + \frac{1}{4}y$',
              r'$f = x$ (limit)']

for idx, (n_val, title) in enumerate(zip(n_values, titles_top)):
    ax = axes[0, idx]

    # Draw unit square
    sq = Polygon(square_verts, fill=True, facecolor='C0', alpha=0.08,
                 edgecolor='C0', lw=1.5)
    ax.add_patch(sq)

    t = np.linspace(-2.5, 2.5, 200)

    if n_val is not None:
        a = 1 - 1/n_val
        b = 1/n_val
        # Draw level sets: a*x + b*y = level
        for level in np.arange(-3, 3.5, 0.5):
            y_line = (level - a * t) / b
            mask = (y_line > -2.5) & (y_line < 2.5)
            lw = 1.5 if abs(level - round(level)) < 0.01 and level == int(level) else 0.6
            alpha_val = 0.5 if lw > 1 else 0.15
            color = 'C3' if abs(level) < 0.01 else 'gray'
            ax.plot(t[mask], y_line[mask], color=color, alpha=alpha_val, lw=lw)
    else:
        a, b = 1.0, 0.0
        # Limit: f(x,y) = x, level sets are vertical lines x = level
        for level in np.arange(-3, 3.5, 0.5):
            if -2.5 < level < 2.5:
                lw = 1.5 if abs(level - round(level)) < 0.01 and level == int(level) else 0.6
                alpha_val = 0.5 if lw > 1 else 0.15
                color = 'C3' if abs(level) < 0.01 else 'gray'
                ax.axvline(level, color=color, alpha=alpha_val, lw=lw)

    # Plot test points with their heights
    for (px, py, label, color) in test_points:
        height = a * px + b * py
        ax.plot(px, py, 'o', color=color, ms=7, zorder=10)
        ax.text(px + 0.08, py + 0.12, f'{label}: {height:.2f}', fontsize=9, color=color)

    ax.set_xlim(-2.2, 2.2)
    ax.set_ylim(-2.2, 2.2)
    ax.set_aspect('equal')
    ax.set_title(title, fontsize=12)
    ax.set_xlabel(r'$x$')
    ax.set_ylabel(r'$y$')

# Bottom row: height readings at each test point as n varies
n_range = np.arange(2, 20)
for idx, (px, py, label, color) in enumerate(test_points):
    ax = axes[1, idx]
    heights = [(1 - 1/n) * px + (1/n) * py for n in n_range]
    limit = px  # f(x,y) = x

    ax.plot(n_range, heights, 'o-', color=color, ms=5, lw=1.5)
    ax.axhline(limit, color=color, ls='--', lw=2, alpha=0.6,
               label=f'limit $f({label[1:-1]}) = {limit}$')
    ax.set_xlabel(r'$n$')
    ax.set_ylabel(f'$f_n({label[1:-1]})$')
    ax.set_title(f'Height readings at {label}', fontsize=11)
    ax.legend(fontsize=9)
    ax.set_ylim(min(min(heights), limit) - 0.3, max(max(heights), limit) + 0.3)

plt.suptitle(r'Weak* convergence: foliations rotate, height readings stabilize at each point',
             fontsize=13, y=1.01)
plt.tight_layout()
plt.show()

Top row: the level sets of $f_n$ rotate as $n$ increases. For $n = 2$ the level sets are diagonal; by $n = 4$ they are nearly vertical; at the limit $f(x,y) = x$ they are exactly vertical. Bottom row: for each fixed test point, the height reading $f_n(p)$ converges to the limiting value $f(p) = p_x$ . This is weak* convergence: the foliations rearrange, but at every fixed point the readings stabilize.

The Banach-Alaoglu theorem¶

We claimed that the weak* topology makes compactness easier. The following theorem delivers on this promise: the closed unit ball of $X^*$ is always weak* compact, for any normed space $X$ .

The intuition is simple. Each $f \in B_{X^*}$ is determined by its readings on all objects, and since $\|f\| \leq 1$ , each reading satisfies $|f(x)| \leq \|x\|$ , so $f(x) \in [-\|x\|, \|x\|]$ . An instrument is a choice of one bounded number per object. Given a sequence of instruments, focus on a single object $x_1$ : the readings $f_n(x_1)$ are bounded, so Bolzano-Weierstrass gives a convergent subsequence. Extract a further subsequence to make the readings converge at $x_2$ , then $x_3$ , and so on.

This is the same diagonal argument behind Arzelà-Ascoli and compact operators: whenever there are countably many coordinates to control, Bolzano-Weierstrass plus diagonalization does the job. If $X$ is separable, a countable dense subset provides the coordinates, and the argument goes through.

We give the full proof for separable spaces, then state the general result.

Theorem 1 (Banach-Alaoglu (separable case))

Let $X$ be a separable Banach space and $(f_n) \subset X^*$ a bounded sequence with $\|f_n\| \leq M$ . Then $(f_n)$ has a weak* convergent subsequence.

Proof 3

Step 1: set up the countable dense subset. Since $X$ is separable, there exists a countable dense subset $\{x_1, x_2, x_3, \ldots\} \subset X$ .

Step 2: successive extraction. Evaluate the sequence $(f_n)$ at $x_1$ . Since $|f_n(x_1)| \leq M\|x_1\|$ for all $n$ , the sequence $(f_n(x_1))$ is bounded in $\mathbb{R}$ . By Bolzano-Weierstrass, extract a convergent subsequence:

(f_n) \;\supset\; (f_{n_1^{(j)}})_{j \geq 1} \quad \text{such that } f_{n_1^{(j)}}(x_1) \text{ converges.}

(13)

Now evaluate this subsequence at $x_2$ . The numbers $(f_{n_1^{(j)}}(x_2))$ are again bounded, so extract a further subsequence:

(f_{n_1^{(j)}}) \;\supset\; (f_{n_2^{(j)}})_{j \geq 1} \quad \text{such that } f_{n_2^{(j)}}(x_1) \text{ and } f_{n_2^{(j)}}(x_2) \text{ both converge.}

(14)

The second subsequence still converges at $x_1$ because it is a subsequence of something that already converged there. Continue: at stage $k$ , extract a subsequence $(f_{n_k^{(j)}})_{j \geq 1}$ that converges at $x_1, \ldots, x_k$ . This gives nested subsequences:

(f_n) \;\supset\; \underbrace{(f_{n_1^{(j)}})}_{\text{conv at } x_1} \;\supset\; \underbrace{(f_{n_2^{(j)}})}_{\text{conv at } x_1, x_2} \;\supset\; \cdots \;\supset\; \underbrace{(f_{n_k^{(j)}})}_{\text{conv at } x_1, \ldots, x_k} \;\supset\; \cdots

(15)

Step 3: the diagonal trick. Define $g_j := f_{n_j^{(j)}}$ , the $j$ -th element of the $j$ -th subsequence. For any fixed $k$ and $j \geq k$ , the element $g_j = f_{n_j^{(j)}}$ belongs to the $j$ -th subsequence, which is a sub-subsequence of the $k$ -th subsequence. So the tail $g_k, g_{k+1}, g_{k+2}, \ldots$ is a subsequence of the $k$ -th extracted sequence, which converges at $x_k$ . Since finitely many initial terms do not affect convergence, $(g_j(x_k))_{j \geq 1}$ converges for every $k$ . Call the limit $L_k$ .

Step 4: extend from the dense subset to all of $X$ . Take arbitrary $x \in X$ . Given $\varepsilon > 0$ , pick $x_k$ from the dense subset with $\|x - x_k\| < \varepsilon$ . Then:

|g_j(x) - g_l(x)| \leq |g_j(x) - g_j(x_k)| + |g_j(x_k) - g_l(x_k)| + |g_l(x_k) - g_l(x)|.

(16)

The first and third terms are bounded by $\|g_j\| \cdot \|x - x_k\| \leq M\varepsilon$ each. The middle term is less than $\varepsilon$ for $j, l$ large enough since $(g_j(x_k))$ converges. So $|g_j(x) - g_l(x)| \leq (2M + 1)\varepsilon$ for $j, l$ sufficiently large. The sequence $(g_j(x))$ is Cauchy in $\mathbb{R}$ , hence convergent. Define $g(x) := \lim_{j \to \infty} g_j(x)$ .

Step 5: the limit is in $X^*$ . Linearity of $g$ follows from linearity of each $g_j$ and limits. Boundedness follows from $|g(x)| = \lim |g_j(x)| \leq M\|x\|$ , so $\|g\| \leq M$ and $g \in X^*$ . By construction $g_j(x) \to g(x)$ for all $x \in X$ , which is exactly $g_j \xrightarrow{w^*} g$ .

Leonidas Alaoglu extended this result to non-separable spaces in his 1938 PhD thesis at the University of Chicago, replacing the diagonal argument with Tychonoff’s theorem.

Remark 2 (The general (non-separable) case)

Corollary 1 (Weak sequential compactness in separable reflexive spaces)

Let $X$ be a separable reflexive Banach space. Then every bounded sequence in $X$ has a weakly convergent subsequence.

Proof 4

This corollary is the reason reflexivity matters in applications. In a separable reflexive space (such as $L^p$ or $W^{k,p}$ for $1 < p < \infty$ ), bounded sequences always have weakly convergent subsequences. Without reflexivity, Banach-Alaoglu still gives weak* compactness of $B_{X^*}$ , but this is a statement about functionals on $X$ , not about elements of $X$ itself.

Example 1 (The bidual can be much larger than XXX)