For applications in PDE and the calculus of variations, we need a topology that is Hausdorff (so limits are unique) and has many compact sets (so we can extract convergent subsequences). These demands pull in opposite directions. The weak topology is the sweet spot: coarse enough for compactness, fine enough for separation.
What we need from a topology¶
For the existence arguments that drive much of applied analysis, we need two things from a topology on a Banach space .
Recall that convergence in a topological space is defined in terms of open sets: a sequence in a topological space converges to if for every open set containing , there exists such that for all . In a metric space, this reduces to the familiar -definition, but in a general topological space the open sets may look very different from metric balls.
Hausdorff (unique limits). We need limits to be unique: if and , then . A topology is Hausdorff if any two distinct points can be separated by disjoint open sets. This forces uniqueness: if , choose disjoint open sets and . Eventually and eventually , but , a contradiction. Without the Hausdorff property, a sequence can converge to multiple points simultaneously (in the indiscrete topology , every sequence converges to every point), and limit arguments become meaningless.
Compact sets (convergent subsequences). We need to extract convergent subsequences from bounded families. A set is compact if every open cover has a finite subcover. Why does this give subsequences? Take a sequence in . Any open cover of has a finite subcover . Since infinitely many terms of the sequence are distributed among finitely many sets, at least one contains infinitely many . This can be made arbitrarily small (by refining the cover), forcing the terms to cluster. Compactness guarantees that sequences cannot escape.
The tension. These two demands pull in opposite directions. The indiscrete topology makes every space trivially compact (there is only one open cover), but no two points can be separated. The discrete topology (every subset is open) makes every space Hausdorff, but only finite sets are compact: given an infinite set , the cover consists of open singletons and has no finite subcover. In general, making a topology coarser helps compactness but threatens separation; making it finer helps separation but destroys compactness.
In finite dimensions, the norm topology gives us both: it is Hausdorff and closed bounded sets are compact (Heine-Borel). In infinite dimensions, the norm topology remains Hausdorff but loses compactness: the closed unit ball is never norm-compact (Riesz’s theorem). For instance, the standard basis vectors in all lie on the unit sphere with for , so no subsequence is Cauchy. The bumps “escape” into new dimensions and never cluster. The norm topology has too many open sets.
The idea is to weaken the topology just enough to recover compactness while keeping the Hausdorff property. The Hahn-Banach theorem makes this possible: it guarantees enough continuous linear functionals to separate points, so the topology they generate is Hausdorff. The resulting weak topology has fewer open sets (and therefore fewer open covers), making compactness easier to achieve.
From norm balls to weak neighborhoods¶
In a normed space , the norm topology is generated by the open balls
The idea behind the weak topology is to relax the notion of neighborhood. The building blocks are slabs: for a single functional and , define
Geometrically, defines a family of parallel hyperplanes , and the slab is the region between and : a strip of width in the direction normal to the hyperplanes. It constrains only in the direction that measures, and is unbounded in every direction within .
import numpy as np
import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
# --- Panel 1: Norm ball ---
ax = axes[0]
theta = np.linspace(0, 2 * np.pi, 200)
ax.fill(np.cos(theta), np.sin(theta), alpha=0.15, color='C0')
ax.plot(np.cos(theta), np.sin(theta), 'C0', lw=2)
ax.set_xlim(-2.5, 2.5)
ax.set_ylim(-2.5, 2.5)
ax.set_aspect('equal')
ax.axhline(0, color='gray', lw=0.5, alpha=0.3)
ax.axvline(0, color='gray', lw=0.5, alpha=0.3)
ax.set_title('Norm ball $B_1(0)$\nBounded in all directions', fontsize=11)
ax.set_xlabel(r'$x_1$')
ax.set_ylabel(r'$x_2$')
# --- Panel 2: Rotated slab for f(x) = (x1 + x2)/sqrt(2) ---
ax = axes[1]
# The functional f(x) = (x1+x2)/sqrt(2) has ||f||=1.
# The slab |f(x)| < 1 is the region between the lines x1+x2 = ±sqrt(2).
# Normal direction: (1,1)/sqrt(2), i.e. 45 degrees.
angle = np.pi / 4
normal = np.array([np.cos(angle), np.sin(angle)])
tangent = np.array([-np.sin(angle), np.cos(angle)])
half_width = 1.0 # epsilon / ||f|| = 1
# Draw the slab as a filled polygon
L = 4.0 # extent along the slab
corners = np.array([
-L * tangent + half_width * normal,
L * tangent + half_width * normal,
L * tangent - half_width * normal,
-L * tangent - half_width * normal,
])
slab = plt.Polygon(corners, alpha=0.15, color='C1', ec='none')
ax.add_patch(slab)
# Draw the boundary lines
for sign in [1, -1]:
p1 = -L * tangent + sign * half_width * normal
p2 = L * tangent + sign * half_width * normal
ax.plot([p1[0], p2[0]], [p1[1], p2[1]], 'C1', lw=2, ls='--')
# Arrows showing unbounded direction along the slab
ax.annotate('', xy=1.9 * tangent, xytext=1.2 * tangent,
arrowprops=dict(arrowstyle='->', color='C1', lw=1.5))
ax.annotate('', xy=-1.9 * tangent, xytext=-1.2 * tangent,
arrowprops=dict(arrowstyle='->', color='C1', lw=1.5))
ax.text(-1.6, 1.9, 'unbounded', fontsize=9, color='C1', style='italic',
rotation=-45)
# Show a point inside the slab but far from the origin
yn = 0.3 * normal + 2.0 * tangent
ax.plot(*yn, 'C3o', ms=8, zorder=5)
ax.text(yn[0] + 0.15, yn[1] - 0.3, r'$y_n$ here!', fontsize=9, color='C3')
# Draw the normal vector
ax.annotate('', xy=1.2 * normal, xytext=(0, 0),
arrowprops=dict(arrowstyle='->', color='C4', lw=1.5))
ax.text(0.55, 0.85, r'$a$', fontsize=11, color='C4')
ax.set_xlim(-2.5, 2.5)
ax.set_ylim(-2.5, 2.5)
ax.set_aspect('equal')
ax.axhline(0, color='gray', lw=0.5, alpha=0.3)
ax.axvline(0, color='gray', lw=0.5, alpha=0.3)
ax.set_title(r'Weak slab $|\langle x, a \rangle| < 1$'
+ '\nUnbounded along $\\ker f$', fontsize=11)
ax.set_xlabel(r'$x_1$')
ax.set_ylabel(r'$x_2$')
plt.tight_layout()
plt.show()
Left: the norm ball is bounded in every direction. Right: the slab constrains only the direction normal to and extends to infinity along it. The vector is the normal direction. The point lies inside the slab but far outside the norm ball.
A weak open set is built up in three layers. Recall that a basis for a topology is a collection of open sets such that every open set in the topology is a union of members of . Equivalently, for every open set and every , there exists with . In the norm topology, the open balls form a basis. In the weak topology, the role of open balls is played by tubes.
Sub-basic open sets (slabs). A single slab imposes one linear constraint. It is an infinite strip, constraining one direction and placing no restriction on any direction in .
Basic open sets (finite intersections of slabs). A finite intersection
constrains directions simultaneously. Think of this as a tube: bounded cross-section in finitely many directions, infinite extent in all others. These tubes form a basis for the weak topology.
General open sets (arbitrary unions of tubes). Every weakly open set is a union of tubes, just as every norm-open set is a union of open balls.
The key point: in infinite dimensions, every tube is unbounded (since always). Since every point in a weakly open set has a tube around it, and every tube is unbounded, every nonempty weakly open set is unbounded in infinite dimensions. No bounded set like a norm ball can be weakly open, and no bounded set can even contain a weakly open set.
Definition 1 (Weak topology)
The weak topology on , denoted , is the coarsest topology making every continuous. A neighborhood basis at consists of the basic open sets:
where and .
Continuity of requires preimages of open sets to be open. Every open set in is a union of open intervals, and is exactly a slab between two parallel hyperplanes. So any topology making all continuous must contain all slabs. The coarsest such topology is the one generated by the slabs alone, with no extra open sets added.
The weak topology is Hausdorff: if , then by the Hahn-Banach theorem there exists with , so and lie in disjoint slabs. Without Hahn-Banach, we would have no guarantee that contains enough functionals to distinguish points, and the weak topology could fail to be Hausdorff.
Example 1 (The weak topology on )
Take with its standard orthonormal basis . By the Riesz representation theorem, every is of the form for some . Consider the functional . The slab
is the region between the hyperplanes and . This slab contains vectors with arbitrarily large norm, as long as their first component is small. For instance, satisfies , so for every , yet . The slab sees only the first coordinate and is blind to the direction.
Adding more functionals narrows the neighborhood but never makes it bounded. The set constrains coordinates but leaves infinitely many unconstrained. This is a tube: bounded cross-section in the first coordinates, unbounded in all others.
Remark 1 (Open, closed, or neither)
Every set in a topological space falls into exactly one of four categories: (1) open but not closed, (2) closed but not open, (3) both (clopen), or (4) neither. In any topology, the vast majority of sets are in category (4).
When passing from the norm topology to the weak topology, sets can only move toward category (4). A norm-open set may lose its openness; a norm-closed set may lose its closedness. No set gains either property, because every weakly open set is norm-open and every weakly closed set is norm-closed. The “neither” category grows at the expense of all three others.
For example, in with the norm topology, the half-open shell is neither open nor closed. In the weak topology, the open ball also joins the “neither” category: it is no longer weakly open (it is bounded), and it is not weakly closed either.
Remark 2 (Open and closed sets in the weak topology)
The weak topology has fewer open sets than the norm topology. Two consequences:
Harder to be open. Norm-open balls are generally not weakly open in infinite dimensions (they would need to contain an unbounded weak neighborhood, which they cannot).
Fewer closed sets, not more. A common misconception is that fewer open sets means more closed sets. In fact, closed sets are complements of open sets, so fewer open sets means fewer closed sets too. A set that is “not open” in the weak topology does not become closed. Most sets are neither.
However, convex norm-closed sets remain weakly closed — this is Mazur's theorem. The closed unit ball is weakly closed because it is convex, not because of a general principle about coarser topologies. Non-convex norm-closed sets can fail to be weakly closed. For example, the set of standard basis vectors in is norm-closed (all pairwise distances are ), but weakly, so 0 is in its weak closure.
Remark 3 (The unit ball as an intersection of slabs)
The slab picture gives a clean characterization of the unit ball. Each with defines a closed slab , and the unit ball fits inside every such slab. The sup formula gives the converse:
The unit ball is exactly the intersection of all dual slabs. This means is weakly closed (it is an intersection of weakly closed sets, since each is weakly continuous).
Strong and weak convergence¶
With the weak topology defined, we can now describe the corresponding notions of convergence. We begin with norm convergence for comparison, then turn to weak convergence.
Definition 2 (Strong convergence)
Let be a normed space and let be a sequence in . We say strongly (or in norm) if
This is the familiar notion: the points physically move toward , and the distance shrinks to zero. Since , strong convergence forces all instrument readings to converge uniformly over . But it is defined by the norm directly, not by the instruments.
Definition 3 (Weak convergence)
Let be a normed space and let be a sequence in . We say weakly if
Each instrument foliates into level sets , the “isotherms” for that measurement. Weak convergence means: in every foliation, the readings settle down to . The objects need not become close to ; they can keep bouncing around, as long as every instrument eventually reads the same value as it does on .
Proposition 1 (Strong convergence implies weak convergence)
Let be a normed space. If strongly, then weakly.
Proof 1
The converse is false in infinite dimensions:
Example 2 (Weak but not strong convergence in )
In , the sequence converges weakly to 0, but not strongly. Each has norm , so the sequence stays on a sphere. But for any , the Riemann-Lebesgue lemma gives:
so for every functional , i.e., .
import numpy as np
import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, 3, figsize=(15, 4.5))
t = np.linspace(0, 1, 500)
# --- Panel 1: The functions sin(nπt) ---
ax = axes[0]
for n, color in [(1, 'C0'), (3, 'C1'), (7, 'C2'), (15, 'C4')]:
ax.plot(t, np.sin(n * np.pi * t), color=color, lw=1.5, alpha=0.8,
label=rf'$\sin({n}\pi t)$')
ax.axhline(0, color='C3', lw=2.5, ls='-', alpha=0.7, label=r'limit $x = 0$')
ax.set_xlabel(r'$t$', fontsize=11)
ax.set_ylabel(r'$x_n(t)$', fontsize=11)
ax.set_title('The sequence: oscillates, never settles', fontsize=11)
ax.legend(fontsize=8, loc='upper right')
ax.tick_params(labelsize=9)
# --- Panel 2: ||x_n|| stays constant ---
ax = axes[1]
ns = np.arange(1, 21)
norms = np.ones_like(ns, dtype=float) / np.sqrt(2)
ax.plot(ns, norms, 'C3o-', ms=5, lw=1.5)
ax.axhline(1/np.sqrt(2), color='C3', ls='--', alpha=0.4)
ax.axhline(0, color='gray', ls='-', alpha=0.3)
ax.set_xlabel(r'$n$', fontsize=11)
ax.set_ylabel(r'$\|x_n\|_{L^2}$', fontsize=11)
ax.set_title(rf'Norm: $\|x_n\| = 1/\sqrt{{2}}$ (no convergence to 0)', fontsize=11)
ax.set_ylim(-0.1, 1.0)
ax.tick_params(labelsize=9)
# --- Panel 3: Inner products with several g decay to 0 ---
ax = axes[2]
test_funcs = [
(lambda t: np.ones_like(t), r'$g = 1$', 'C0'),
(lambda t: t, r'$g = t$', 'C1'),
(lambda t: np.cos(2*np.pi*t), r'$g = \cos(2\pi t)$', 'C2'),
(lambda t: np.where(t < 0.5, 1.0, -1.0), r'$g = \mathrm{sgn}(t-1/2)$', 'C4'),
]
ns = np.arange(1, 26)
dt = t[1] - t[0]
for g_func, label, color in test_funcs:
g_vals = g_func(t)
inner_prods = [np.sum(g_vals * np.sin(n * np.pi * t)) * dt for n in ns]
ax.plot(ns, inner_prods, 'o-', color=color, ms=4, lw=1.2, label=label)
ax.axhline(0, color='gray', ls='-', alpha=0.3)
ax.set_xlabel(r'$n$', fontsize=11)
ax.set_ylabel(r'$\langle g, x_n \rangle$', fontsize=11)
ax.set_title(r'Every foliation height $\to 0$: weak convergence', fontsize=11)
ax.legend(fontsize=8, loc='upper right')
ax.tick_params(labelsize=9)
plt.tight_layout()
plt.show()
Left: the functions oscillate faster and faster, never settling down pointwise. Center: the norm stays at for every , so the sequence does not converge strongly. Right: the inner product decays to 0 for every test function , no matter its shape. Every “foliation height” converges, even though the functions themselves keep bouncing around. This is weak convergence without strong convergence.
Remark 4 (Pointwise vs. uniform convergence over the dual)
Why does weak convergence not imply strong convergence? After all, weak convergence requires for all -- isn’t “all” a strong demand?
The key is that weak convergence is pointwise in : we fix first, then send . For any fixed , the Fourier coefficients because the tail of a square-summable series vanishes. Each instrument eventually loses track of the oscillation.
But we are free to change the instrument with . Choose , i.e., the instrument that is perfectly aligned with . Then
for every : this “adaptive” instrument always catches the oscillation. The supremum over the unit ball,
never decays. But this supremum is the norm of , and driving it to zero would be exactly strong convergence.
Why is this not a problem for weak convergence? Because the definition (Definition 3) quantifies as: for every fixed , . The functional is chosen once and for all, and then we ask whether the sequence of numbers converges. An adaptive choice that changes with does not define a single sequence of real numbers: it defines a different measurement at each step. This is not what any individual instrument reads; it is a meta-observation assembled by switching instruments. No single linear measuring instrument in witnesses the non-convergence.
So the gap between weak and strong convergence is precisely the gap between pointwise and uniform convergence over : each fixed linear instrument eventually stops detecting the oscillation, but the worst-case instrument shifts with to stay aligned with .
The identity is a consequence of the Hahn-Banach theorem: for any , there exists a norming functional with and . So the supremum over the unit ball of is always attained (or approached, in the non-reflexive case), and equals the norm. In other words, the norm of is exactly the largest reading that any unit-norm linear instrument can produce on .
This is also where the Banach-Steinhaus theorem (Theorem 1) enters: weak convergence implies that is bounded for each fixed . Each acts as a bounded linear functional on via evaluation, , and the family is pointwise bounded on . The uniform boundedness principle then gives . So weak convergence automatically implies norm-boundedness, but not norm-convergence.
The foliation picture of weak convergence¶
Each functional defines a foliation of into parallel hyperplanes . Weak convergence means that in every foliation, the heights converge to 0. The points jump between different hyperplanes in each foliation, but eventually the heights settle near the origin’s level set.
import numpy as np
import matplotlib.pyplot as plt
# Compute actual inner products for sin(nπt) against several g
t = np.linspace(0, 1, 1000)
dt = t[1] - t[0]
ns_all = np.arange(1, 16)
test_funcs = [
(np.ones_like(t), r'$g_1 = 1$'),
(t, r'$g_2 = t$'),
(np.cos(2*np.pi*t), r'$g_3 = \cos(2\pi t)$'),
]
# Compute heights for each functional
heights = {}
for g_vals, label in test_funcs:
heights[label] = [np.sum(g_vals * np.sin(n * np.pi * t)) * dt for n in ns_all]
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
colors_pts = plt.cm.viridis(np.linspace(0.1, 0.9, len(ns_all)))
for idx, (g_vals, label) in enumerate(test_funcs):
ax = axes[idx]
h = heights[label]
# Draw the foliation: horizontal lines representing level sets of g
for c in np.arange(-0.8, 0.9, 0.1):
alpha = 0.5 if abs(c) < 0.005 else 0.12
lw_val = 2.0 if abs(c) < 0.005 else 0.6
ax.axhline(c, color='gray', alpha=alpha, lw=lw_val)
# The zero level set (the kernel) highlighted
ax.axhline(0, color='C0', lw=2, alpha=0.6, label=r'$\langle g, 0 \rangle = 0$')
# Plot each x_n as a point at its height in this foliation
for i, n in enumerate(ns_all):
ax.plot(n, h[i], 'o', color=colors_pts[i], ms=7, zorder=5)
# Connect with a line to show the trajectory
ax.plot(ns_all, h, '-', color='C4', alpha=0.4, lw=1)
# Shade the band near 0 to show convergence
ax.axhspan(-0.05, 0.05, color='C0', alpha=0.08)
ax.set_xlabel(r'$n$', fontsize=11)
ax.set_ylabel(r'height $\langle g, x_n \rangle$', fontsize=11)
ax.set_title(f'Foliation by {label}', fontsize=11)
ax.legend(fontsize=9, loc='upper right')
ax.set_ylim(-0.8, 0.8)
ax.tick_params(labelsize=9)
plt.suptitle(r'Weak convergence of $\sin(n\pi t) \rightharpoonup 0$: heights settle to $0$ in every foliation',
fontsize=12, y=1.02)
plt.tight_layout()
plt.show()
Each panel is a different foliation of , defined by a functional . The vertical axis is the “height” that the foliation assigns to . The points jump around between level sets, but in every foliation the heights converge to 0 (the blue line). The sequence never converges in norm, yet every foliation eventually reads near-zero heights. This is weak convergence.
Strong convergence means the objects cluster in space. Weak convergence means every instrument reading converges, even though the objects may keep moving. The gap is real: but .
In finite dimensions, this gap disappears: weak and strong convergence are equivalent (finitely many instruments suffice to control the norm). The gap is an essentially infinite-dimensional phenomenon.
Weak limits can lose mass¶
Strong convergence preserves the norm: implies . Weak convergence does not. The norm can drop in the limit, but it cannot increase.
Proposition 2 (Weak lower semicontinuity of the norm)
Let be a normed space. If weakly, then
Proof 2
By the sup formula, for any with , we have . Since , the left side converges: . The right side need not converge (the norms may oscillate), but always exists. If and exists, then . Therefore
Taking the supremum over all with :
This is the price of weak convergence: mass can escape to infinity. In the example , the norm stays at while the weak limit has norm 0. The inequality is sharp.
Basic properties of weak convergence¶
Proposition 3 (Weak limits are unique)
Let be a normed space. If and , then .
Proof 3
For every , we have , so for all . By the sup formula, , so .
Proposition 4 (Weakly convergent sequences are bounded)
Let be a Banach space. If , then .
Proof 4
Proposition 5 (Compact operators turn weak convergence into strong convergence)
Let be Banach spaces and a compact linear operator. If in , then strongly in .
Proof 5
Since , Proposition 4 gives . The sequence converges weakly to 0 and is bounded, so lies in the image of a bounded set under a compact operator, hence its closure is compact in .
It suffices to show every subsequence of has a further subsequence converging to . Let be any subsequence. Since is compact and is bounded, there exists a further subsequence such that for some . We identify : for any , the functional , so
Since strongly, also , giving for all . By Hahn-Banach, .
Since every subsequence of has a further subsequence converging to , the full sequence converges: .
Weak compactness in reflexive spaces¶
The whole point of weakening the topology was to gain compactness. With fewer open sets there are fewer open covers, so it becomes easier for a set to be compact. The following theorem makes this precise for reflexive spaces.
Theorem 1 (Weak compactness of the unit ball)
Let be a reflexive Banach space. Then the closed unit ball is compact in the weak topology. In particular, every bounded sequence in has a weakly convergent subsequence.
The proof uses the Banach-Alaoglu theorem (Theorem 1) and the canonical embedding; see Corollary 1.
This theorem is the reason reflexivity matters in applications. In reflexive spaces like () and Sobolev spaces (), every bounded sequence has a weakly convergent subsequence. This is the compactness step in the direct method of the calculus of variations: minimize an energy functional over a bounded set, extract a weakly convergent subsequence, and pass to the limit.
What goes wrong without reflexivity¶
Why does the theorem require reflexivity? If weak compactness held for every Banach space, then every bounded sequence would have a weakly convergent subsequence. Consider with the duality chain , (recall Example 5). The space is not reflexive: the canonical embedding misses elements like .
Consider the bounded sequence , with . Can we extract a weakly convergent subsequence? In fact the full sequence already has the property that the readings converge: for any ,
which is a partial sum of the absolutely convergent series (since ), so .
What we want: a subsequence and some such that for every .
What this requires: we already know for every . Since a subsequence of a convergent sequence converges to the same limit, as well. So the candidate must satisfy for every .
Why the only candidate is not in : choose , which reads the -th coordinate. On one hand, . On the other hand, . So for every , forcing . But this constant sequence does not converge to zero, so . No subsequence of converges weakly in .
Where does this candidate live? The canonical embedding identifies each element of with an evaluation functional on . The candidate is a bounded sequence, so it defines a valid element of . It acts on by
This is a perfectly good element of the bidual , but it is not in the range of . Explicitly:
since the constant sequence does not converge to zero. So is not surjective: . The sequence has “escaped” into the bidual — the readings converge, but the limit lives in .
Reflexivity means : there is no gap, so there is no room to escape. In a reflexive space, every candidate limit that is consistent with the readings already lives in .
Separation and convexity in the weak topology¶
This section explores the topological separation properties of the weak topology and their connection to Mazur’s theorem. It is not needed for the main development but clarifies why convexity plays such a distinguished role.
Separation axioms: from to ¶
The separation axioms form a hierarchy of increasing strength:
(Hausdorff): Any two distinct points can be separated by disjoint open sets.
(regular Hausdorff): Any point and closed set not containing it can be separated by disjoint open sets.
(completely regular / Tychonoff): Any point and closed set not containing it can be separated by a continuous function .
(normal Hausdorff): Any two disjoint closed sets can be separated by disjoint open sets.
Each level upgrades what can be separated: separates points from points, separates points from closed sets by open sets, strengthens this to separation by continuous functions, and separates closed sets from closed sets.
Proposition 6 (Metric spaces are )
Every metric space is .
Proof 6
Given disjoint closed sets , the function
is continuous with on and on (this works because when and both are closed), so and are disjoint open sets separating and .
Since every normed space is a metric space, every Banach space is in its norm topology. When we pass to the weak topology, we lose the metric but the topology is still Hausdorff (), since separates points.
Mazur’s theorem: convexity restores separation¶
The natural question is: can we separate points from closed sets, not just from other points? The weak topology is defined so that every is continuous, and the Hahn-Banach theorem produces such an that strictly separates a point from a closed convex set. So for convex sets, the functionals in play exactly the role of the separating continuous functions required by the axiom. Mazur’s theorem makes this precise: convexity is exactly the condition under which norm-closure and weak-closure agree.
Theorem 2 (Mazur’s theorem)
Let be a normed space and a convex set. Then is norm-closed if and only if it is weakly closed.
Proof 7
Since the weak topology is coarser than the norm topology, every weakly closed set is norm-closed. The content is the converse: a norm-closed convex set is weakly closed.
It suffices to show that if and is norm-closed and convex, then is not in the weak closure of , i.e., there is a weakly open set containing that misses .
Since is norm-closed and , there exists with . By the geometric form of the Hahn-Banach theorem (strict separation of a point from a closed convex set), there exists and such that
The set is a weakly open slab containing and disjoint from . Therefore is not in the weak closure of .
Mazur’s theorem says that for convex sets, you cannot tell the difference between norm-closure and weak-closure. This is why the closed unit ball, closed convex hulls, and closed subspaces are all weakly closed. Non-convex sets do not enjoy this protection: the set in is norm-closed but not weakly closed.
The connection to separation is now clear: the weak topology lacks (separation of arbitrary closed sets), but Hahn-Banach gives separation of points from closed convex sets. Mazur’s theorem is the payoff — convexity is the precise condition under which the weak topology’s separation power suffices.