Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Sobolev Embeddings

Big Idea

Bounded sequences in L2L^2 can fail to have convergent subsequences in two ways: oscillation (mass escaping to high frequencies) and translation (mass escaping to spatial infinity). Controlling a derivative kills oscillation; working on a bounded domain kills translation. Together they restore Bolzano-Weierstrass. This is Rellich-Kondrachov compactness.

Along the way, controlling derivatives also upgrades integrability. The Sobolev embedding theorem makes this quantitative: ss derivatives in LpL^p buy integrability up to LpL^{p^*} (or continuity if sp>dsp > d), with exchange rate 1/p=1/ps/d1/p^* = 1/p - s/d set by an uncertainty principle.

Motivation: when does a bounded sequence converge?

In finite dimensions, Bolzano-Weierstrass guarantees that every bounded sequence has a convergent subsequence. In infinite-dimensional spaces like L2(Ω)L^2(\Omega) this fails dramatically, and understanding how it fails is the key to the whole chapter.

Consider a bounded sequence {un}L2(Ω)\{u_n\} \subset L^2(\Omega), say unL21\|u_n\|_{L^2} \leq 1. What can go wrong? There are essentially two pathologies:

(P1) Oscillation, where energy escapes to high frequencies. Take Ω=(0,1)\Omega = (0,1) and un(x)=2sin(nπx)u_n(x) = \sqrt{2}\,\sin(n\pi x). Each unu_n has unL2=1\|u_n\|_{L^2} = 1, but any two distinct terms are orthogonal, so unumL2=2\|u_n - u_m\|_{L^2} = \sqrt{2} for nmn \neq m. No subsequence is Cauchy. The mass does not escape in space; instead it escapes into ever higher frequencies.

(P2) Translation, where energy escapes to spatial infinity. On Ω=Rd\Omega = \mathbb{R}^d, fix a bump φCc\varphi \in C_c^\infty with φL2=1\|\varphi\|_{L^2} = 1 and set un(x)=φ(xne1)u_n(x) = \varphi(x - n e_1). Each unu_n has norm 1, but the supports become disjoint, so again unumL2=2\|u_n - u_m\|_{L^2} = \sqrt{2}. The mass slides off to infinity.

These two examples exhaust the ways compactness can fail in LpL^p: any non-precompact bounded sequence is, up to subsequence, a combination of oscillating modes and escaping bumps. The whole chapter can be read as a program to defeat these two pathologies and recover Bolzano-Weierstrass:

Doing both yields Rellich-Kondrachov compactness: the embedding H1(Ω)L2(Ω)H^1(\Omega) \hookrightarrow L^2(\Omega) is compact on bounded Ω\Omega. Along the way we will also see how controlling derivatives not only rescues compactness but upgrades integrability, climbing the ladder toward LL^\infty and Hölder continuity. This is the Sobolev embedding theorem.

From Sobolev regularity to integrability

The Sobolev norm controls more than just derivatives; it forces the function to have better integrability and even continuity, depending on how many derivatives are controlled.

Before making this quantitative, it is worth asking why we want integrability control at all. The answer points to a single goal: we are trying to climb the ladder

Lp    Lq        L    C0    C0,α,L^p \;\subset\; L^q \;\subset\; \cdots \;\subset\; L^\infty \;\subset\; C^0 \;\subset\; C^{0,\alpha},

because the top of the ladder is where classical tools live. Continuous and equicontinuous functions are precompact by Arzelà-Ascoli (Theorem 1); continuity lets us evaluate uu at a point, impose classical boundary values, and make sense of nonlinear terms like uqu^q pointwise. The LpL^p rungs in between are what we settle for when we cannot reach the top. The Sobolev embedding theorem tells us exactly how high we can climb given a fixed budget of derivatives.

The simplest instance is the Poincaré inequality, which in one dimension turns out to climb all the way.

The Poincaré inequality: the prototype embedding

Everything in the Sobolev embedding theorem can be read off a single one-dimensional calculation. We do that calculation here, and the result is the prototype: a single pointwise estimate, obtained from the fundamental theorem of calculus and Cauchy-Schwarz, produces three regularity statements at once,

L2 control    L control    Ho¨lder control.\text{$L^2$ control} \;\rightarrow\; \text{$L^\infty$ control} \;\rightarrow\; \text{Hölder control}.

Every higher-dimensional Sobolev embedding is a way of running this same template in dd directions.

The single estimate. Take Ω=(0,L)\Omega = (0, L) and uH01(0,L)u \in H^1_0(0, L), so u(0)=0u(0) = 0. The fundamental theorem of calculus gives the representation u(x)=0xu(t)dtu(x) = \int_0^x u'(t)\,dt, and Cauchy-Schwarz turns it into a pointwise bound:

u(x)2  =  0x1u(t)dt2    x0xu(t)2dt    LuL22.()|u(x)|^2 \;=\; \Bigl|\int_0^x 1 \cdot u'(t)\,dt\Bigr|^2 \;\leq\; x \int_0^x |u'(t)|^2\,dt \;\leq\; L\,\|u'\|_{L^2}^2. \tag{$\star$}

(Strictly speaking, the FTC representation holds pointwise for uC1u \in C^1 with u(0)=0u(0)=0; the estimate ()(\star) then extends to all of H01(0,L)H^1_0(0,L) by density, and simultaneously shows that each uH01u \in H^1_0 has a continuous representative for which u(x)=0xu(t)dtu(x)=\int_0^x u'(t)\,dt holds.)

That is the engine. The three regularity statements all fall out by reading ()(\star) in different ways.

Rung 1: L2L^2 control (the Poincaré inequality). Integrate ()(\star) in xx over (0,L)(0, L):

uL22L2uL22,uL2LuL2.\|u\|_{L^2}^2 \leq L^2\,\|u'\|_{L^2}^2, \qquad \|u\|_{L^2} \leq L\,\|u'\|_{L^2}.

The gradient controls the function in mean square.

Rung 2: LL^\infty control. Take the supremum over xx in ()(\star):

uL(0,L)L1/2uL2.\|u\|_{L^\infty(0,L)} \leq L^{1/2}\,\|u'\|_{L^2}.

The gradient controls the function pointwise.

Rung 3: Hölder control. Apply the same Cauchy-Schwarz argument to the difference u(x)u(y)=yxu(t)dtu(x) - u(y) = \int_y^x u'(t)\,dt:

u(x)u(y)xy1/2uL2,|u(x) - u(y)| \leq |x - y|^{1/2}\,\|u'\|_{L^2},

so uC0,1/2(Ω)u \in C^{0,1/2}(\overline{\Omega}) with seminorm bounded by uL2\|u'\|_{L^2}. The gradient controls the function’s continuity.

Three regularity classes, one estimate. The classical Poincaré inequality is just rung 1, and we record it formally.

Theorem 1 (Poincaré inequality on H01H^1_0)

Let ΩRd\Omega \subset \mathbb{R}^d be a bounded, connected open set with Lipschitz boundary. There exists a constant C=C(Ω)C = C(\Omega) such that for all uH01(Ω)u \in H^1_0(\Omega),

uL2(Ω)CuL2(Ω).\|u\|_{L^2(\Omega)} \leq C\,\|\nabla u\|_{L^2(\Omega)}.

In one dimension this trade is lossless: a single derivative in L2L^2 buys all the way to Hölder continuity, the top of the integrability ladder. Rellich compactness then comes for free, since a bounded set in H01(0,L)H^1_0(0,L) is uniformly bounded and equicontinuous, and Arzelà-Ascoli (Theorem 1) delivers a C0C^0-convergent subsequence. The motivating oscillation pathology un(x)=2sin(nπx)u_n(x) = \sqrt{2}\sin(n\pi x) is automatically excluded: its H01H^1_0 norm 2nπ\sqrt{2}\,n\pi blows up, so {un}\{u_n\} never enters the bounded H01H^1_0 ball in the first place.

The variance picture

The L2L^2 rung has a probabilistic flavor: uuˉL22\|u - \bar u\|_{L^2}^2 is the variance of uu around its mean, summed in squared L2L^2 sense. The plot below shows a typical H01H^1_0 function on (0,1)(0, 1) with its mean line; each arrow has length u(x)uˉ|u(x) - \bar u|, and the squared sum of all arrows is uuˉL22\|u - \bar u\|_{L^2}^2. Poincaré says this total cannot exceed C2uL22C^2\|\nabla u\|_{L^2}^2: a flatter function (small gradient) means short arrows, hence small variance.

Source
<Figure size 800x450 with 1 Axes>

The H01H^1_0 statement of Poincaré bounds uL2\|u\|_{L^2} rather than just uuˉL2\|u - \bar u\|_{L^2}, and the mean does not appear. This is sometimes called “the mean drops out,” but it deserves a careful reading. The Pythagorean decomposition

uL22  =  uuˉL22variance  +  uˉ2Ωmean energy\|u\|_{L^2}^2 \;=\; \underbrace{\|u - \bar u\|_{L^2}^2}_{\text{variance}} \;+\; \underbrace{|\bar u|^2\,|\Omega|}_{\text{mean energy}}

splits the squared norm into orthogonal pieces. On all of H1H^1, the gradient controls only the variance: a constant function has u=0\nabla u = 0 but arbitrary mean. On H01H^1_0, the boundary condition forces uu back to zero at Ω\partial\Omega, so a small gradient also prevents the function from sitting at a high plateau; the mean energy uˉ2Ω|\bar u|^2 |\Omega| is itself bounded by the gradient. Indeed, FTC with u(0)=0u(0)=0 and Cauchy-Schwarz give u(x)L1/2uL2|u(x)| \leq L^{1/2}\|u'\|_{L^2} pointwise, so uˉL1/2uL2|\bar u| \leq L^{1/2}\|u'\|_{L^2} and thus uˉ2ΩL2uL22|\bar u|^2|\Omega| \leq L^2\|u'\|_{L^2}^2 — no appeal to Poincaré needed. Both orthogonal pieces are then controlled, and they combine into the cleaner uL22C2uL22\|u\|_{L^2}^2 \leq C^2\|\nabla u\|_{L^2}^2. The mean is not made zero (a typical H01H^1_0 function has uˉ0\bar u \neq 0); the boundary condition has just made it controllable in the same breath as the variance.

Remark 1 (The mean-free Poincaré-Wirtinger version on H1H^1)

Without a boundary condition, the gradient cannot detect constants ((u+c)=u\nabla(u + c) = \nabla u), so any inequality of the form uL2CuL2\|u\|_{L^2} \leq C\|\nabla u\|_{L^2} fails on H1H^1. The fix is to subtract the mean,

uuˉL2(Ω)CuL2(Ω),uˉ=1ΩΩudx,\|u - \bar u\|_{L^2(\Omega)} \leq C\,\|\nabla u\|_{L^2(\Omega)}, \qquad \bar u = \frac{1}{|\Omega|}\int_\Omega u\,dx,

which projects uu onto the orthogonal complement of constants where \nabla is injective. This is the Poincaré-Wirtinger inequality. Its proof requires a different technique (typically compactness and contradiction); the FTC argument above does not extend. We do not use this version in this chapter.

The frequency picture as confirmation

The same ladder appears in Fourier coordinates. On (0,L)(0, L) with zero boundary, the eigenfunctions of Δ-\Delta are sin(nπx/L)\sin(n\pi x/L) with eigenvalues (nπ/L)2(n\pi/L)^2. These form an orthogonal basis of L2(0,L)L^2(0,L) and the same family, suitably normalized by nπ/Ln\pi/L, is an orthogonal basis of H01(0,L)H^1_0(0,L). So for uH01(0,L)u \in H^1_0(0,L) we may write u=nu^nsin(nπx/L)u = \sum_n \hat u_n \sin(n\pi x/L) with convergence in both L2L^2 and H01H^1_0, and

uL22=nu^n2,uL22=nn2π2L2u^n2.\|u\|_{L^2}^2 = \sum_n |\hat u_n|^2, \qquad \|\nabla u\|_{L^2}^2 = \sum_n \tfrac{n^2\pi^2}{L^2}\,|\hat u_n|^2.

Rung 1: L2L^2 control (Poincaré). Term by term the coefficient inequality is immediate (since n1n \geq 1), and summing in nn gives Poincaré:

u^n2L2π2n2π2L2u^n2uL22L2π2uL22.|\hat u_n|^2 \leq \tfrac{L^2}{\pi^2} \tfrac{n^2\pi^2}{L^2} |\hat u_n|^2 \quad\Longrightarrow\quad \|u\|_{L^2}^2 \leq \tfrac{L^2}{\pi^2} \|\nabla u\|_{L^2}^2.

The optimal constant C=L/πC = L/\pi is attained by the lowest mode sin(πx/L)\sin(\pi x/L), the slowest-oscillating eigenfunction.

Rung 2: LL^\infty control. Bound the series pointwise by its coefficients and apply Cauchy-Schwarz on the index nn:

u(x)=nu^nsin(nπx/L)nu^n(nL2n2π2)1/2(nn2π2L2u^n2)1/2=CuL2.|u(x)| = \Bigl|\sum_n \hat u_n \sin(n\pi x/L)\Bigr| \leq \sum_n |\hat u_n| \leq \Bigl(\sum_n \tfrac{L^2}{n^2\pi^2}\Bigr)^{1/2} \Bigl(\sum_n \tfrac{n^2\pi^2}{L^2}|\hat u_n|^2\Bigr)^{1/2} = C_\infty\,\|\nabla u\|_{L^2}.

The convergent sum n2\sum n^{-2} is what makes the bound finite, and it is exactly the Fourier shadow of the 0x1udt\int_0^x 1 \cdot u'\,dt step in ()(\star).

Rung 3: Hölder control. Expand the difference and use sinasinbmin(2,ab)|\sin a - \sin b| \leq \min(2,\,|a-b|) termwise, then Cauchy-Schwarz:

u(x)u(y)nu^nmin ⁣(2,nπxyL)(nmin(2,nπxy/L)2n2π2/L2)1/2uL2xy1/2uL2.|u(x) - u(y)| \leq \sum_n |\hat u_n|\,\min\!\Bigl(2,\,\tfrac{n\pi|x-y|}{L}\Bigr) \leq \Bigl(\sum_n \tfrac{\min(2,\,n\pi|x-y|/L)^2}{n^2\pi^2/L^2}\Bigr)^{1/2} \|\nabla u\|_{L^2} \lesssim |x-y|^{1/2}\,\|\nabla u\|_{L^2}.

Split the last sum at the crossover nL/(πxy)n \sim L/(\pi|x-y|) where the two arguments of min\min cross. Low-frequency modes (nn below the crossover) are still in the smooth regime sinasinbab|\sin a - \sin b| \approx |a-b|, so they contribute the xy|x-y|-factor; high-frequency modes (nn above the crossover) oscillate too fast to resolve the displacement xy|x-y|, so only the crude bound 2 is available. Balancing the two halves gives the xy1/2|x-y|^{1/2} exponent.

So all three rungs are visible in Fourier, each as a different way of weighting n2u^n2\sum n^2 |\hat u_n|^2 against a convergent geometric factor.

L2L^2-compactness (Rellich-Kondrachov in 1D). The same expansion delivers L2L^2-compactness in one line. Let B={u:uH01M}B = \{u : \|u\|_{H^1_0} \leq M\}, and let PNu=nNu^nsin(nπx/L)P_N u = \sum_{n \leq N} \hat u_n \sin(n\pi x/L) be the projection onto the first NN modes. Then PNuP_N u lives in the finite-dimensional (hence precompact) subspace VN=span{sin(nπx/L):nN}V_N = \operatorname{span}\{\sin(n\pi x/L) : n \leq N\}, while the tail is uniformly small in L2L^2:

uPNuL22  =  n>Nu^n2    L2N2π2n>Nn2π2L2u^n2    L2M2N2π2  N  0.\|u - P_N u\|_{L^2}^2 \;=\; \sum_{n > N} |\hat u_n|^2 \;\leq\; \tfrac{L^2}{N^2\pi^2} \sum_{n > N} \tfrac{n^2\pi^2}{L^2}|\hat u_n|^2 \;\leq\; \tfrac{L^2 M^2}{N^2\pi^2} \;\xrightarrow{N\to\infty}\; 0.

So BB is totally bounded in L2L^2 (covered by finitely many balls of any prescribed radius, using VNV_N’s own compactness for the head), hence precompact. This is Rellich-Kondrachov in 1D.

The same ball BB is also equicontinuous in C([0,L])C([0,L]) by the Hölder rung (u(x)u(y)Mxy1/2|u(x)-u(y)| \leq M|x-y|^{1/2} uniformly in uBu \in B), so Arzelà-Ascoli gives uniform-norm precompactness — and CL2C \hookrightarrow L^2 upgrades this to L2L^2-precompactness too. The two proofs are the same phenomenon seen twice: high-frequency modes decay and uniform equicontinuity are dual descriptions of “the bounded H01H^1_0 ball has no room to escape to infinity in L2L^2.”

The uncertainty principle and the general theorem

The Sobolev embedding theorem says that controlling kk derivatives in LpL^p forces membership in LpL^{p^*} with 1/p=1/pk/d1/p^* = 1/p - k/d. Where does this critical exponent come from? The answer is an uncertainty principle: a function cannot be simultaneously localized in both space and frequency.

Two regimes of the model function

Fix a smooth bump ϕ\phi on Rd\mathbb{R}^d with support in the unit ball, and a unit vector eRde \in \mathbb{R}^d. The model function is

fA,R,N(x)=Aϕ(x/R)sin(Nex),xRd,f_{A,R,N}(x) = A\,\phi(x/R)\,\sin(N\,e\cdot x), \qquad x \in \mathbb{R}^d,

a bump of height AA, spatial scale RR, oscillating at frequency NN along direction ee. Its LpL^p norms scale with the volume RdR^d of the support:

fLp    ARd/p.\|f\|_{L^p} \;\sim\; A\,R^{d/p}.

Its gradient has two competing terms:

Which dominates depends on the relationship between RR and NN:

Well-behaved regime: RN1RN \geq 1. The oscillation term dominates: fLpANRd/p\|\nabla f\|_{L^p} \sim A N\,R^{d/p}. The function completes many oscillations within its support, and derivatives faithfully measure the frequency. The Sobolev norm behaves as the heuristic predicts: fW1,pARd/pN\|f\|_{W^{1,p}} \approx A\,R^{d/p}\,N.

Ill-behaved regime: RN<1RN < 1. The envelope term dominates: fLpARd/p1\|\nabla f\|_{L^p} \sim A\,R^{d/p - 1}. The function is so narrow that it does not complete even one oscillation. The “frequency” NN is invisible, and the gradient is controlled by the compression 1/R1/R of the bump.

The following plot illustrates both regimes in the 1D slice along direction ee (so d=1d = 1 for the picture; the analysis above is in general Rd\mathbb{R}^d).

Source
<Figure size 1000x600 with 4 Axes>

The dividing line between these regimes is the uncertainty principle for functions:

RN1.\boxed{R \cdot N \gtrsim 1.}

A function oscillating at frequency NN must spread over at least one wavelength in the oscillating direction: its spatial scale satisfies R1/NR \gtrsim 1/N. This is the same constraint in any dimension, since it is a statement about a single oscillating direction, not about volume. The factor of dd that enters the Sobolev exponent comes from a different place: it is the volume scaling fLpARd/p\|f\|_{L^p} \sim A R^{d/p} above, not from uncertainty itself.

Remark 2 (Classical uncertainty and Sobolev embedding)

The classical Fourier uncertainty principle states: if ff is concentrated in an interval of width Δx\Delta x, then its Fourier transform f^\hat{f} must be spread over a frequency range Δξ\Delta\xi with ΔxΔξ1\Delta x \cdot \Delta\xi \geq 1. A function cannot be simultaneously localized in both space and frequency.

The Sobolev embedding theorem is the quantitative form of this principle. Trading ss derivatives of regularity (frequency control) for integrability (spatial behavior) is precisely the trade-off between frequency localization and spatial localization. The critical exponent 1/q=1/ps/d1/q = 1/p - s/d quantifies the exchange rate, combining the uncertainty bound RN1RN \gtrsim 1 with volume scaling in dd dimensions.

Deriving the critical exponent from the uncertainty principle

Work in Rd\mathbb{R}^d with the model function above. For Ws,pW^{s,p}, the highest-order term dominates in the well-behaved regime:

fWs,p    ARd/pNs.\|f\|_{W^{s,p}} \;\sim\; A\,R^{d/p}\,N^s.

Normalize fWs,p=1\|f\|_{W^{s,p}} = 1, so ARd/pNsA \sim R^{-d/p} N^{-s}. The LqL^q norm is then

fLq    ARd/q  =  Rd/qd/pNs.\|f\|_{L^q} \;\sim\; A\,R^{d/q} \;=\; R^{d/q - d/p}\,N^{-s}.

We want to find the largest qq such that fLq\|f\|_{L^q} stays bounded across all well-behaved functions (all R,NR, N with RN1RN \geq 1). The most dangerous case is the extremal one, where the function is as concentrated as uncertainty allows: R=1/NR = 1/N. Substituting:

fLq    Nd/pd/qs.\|f\|_{L^q} \;\sim\; N^{d/p - d/q - s}.

This is bounded for all N1N \geq 1 iff d/pd/qs0d/p - d/q - s \leq 0, i.e. 1/q1/ps/d1/q \geq 1/p - s/d. The critical case gives exactly the Sobolev exponent:

1q=1psdq=p=dpdsp.\frac{1}{q} = \frac{1}{p} - \frac{s}{d} \qquad \Longleftrightarrow \qquad q = p^* = \frac{dp}{d - sp}.

Functions at the uncertainty boundary R=1/NR = 1/N are the extremizers of the embedding: as concentrated as the uncertainty principle allows, making the LqL^q norm as large as possible for a given Ws,pW^{s,p} norm. The Sobolev embedding theorem says that even these extremal functions have bounded LpL^{p^*} norm.

Notice what each ingredient contributed:

A purely dimensional derivation of the same Sobolev exponent, via rescaling uλ(x)=u(λx)u_\lambda(x) = u(\lambda x), is worked through in ex-sobolev-scaling.

The general embedding theorem

Theorem 2 (Sobolev embedding theorem)

Let ΩRd\Omega \subseteq \mathbb{R}^d be open with suitable regularity (for instance Lipschitz boundary, or Ω=Rd\Omega = \mathbb{R}^d).

  1. Subcritical case (kp<dkp < d): Wk,p(Ω)Lp(Ω)W^{k,p}(\Omega) \hookrightarrow L^{p^*}(\Omega), where p=dp/(dkp)p^* = dp/(d - kp) is the Sobolev conjugate. Controlling kk derivatives in LpL^p gives integrability up to LpL^{p^*}.

  2. Critical case (kp=dkp = d): Wk,p(Ω)Lq(Ω)W^{k,p}(\Omega) \hookrightarrow L^q(\Omega) for all q<q < \infty (but not LL^\infty in general). The function is “almost continuous” but may have logarithmic singularities.

  3. Supercritical case (kp>dkp > d): Wk,p(Ω)Cm,α(Ω)W^{k,p}(\Omega) \hookrightarrow C^{m,\alpha}(\overline{\Omega}), where the orders of smoothness and Hölder regularity are determined by the Sobolev number

    s  =  kdp  >  0,s \;=\; k - \tfrac{d}{p} \;>\; 0,

    which records “how much regularity is left over after spending d/pd/p derivatives on dimension.” The split is

    • If d/pZd/p \notin \mathbb{Z}:   m=s  \;m = \lfloor s \rfloor\; and   α=ss(0,1)\;\alpha = s - \lfloor s \rfloor \in (0,1). Equivalently m=kd/p1m = k - \lfloor d/p \rfloor - 1 and α=d/p+1d/p\alpha = \lfloor d/p \rfloor + 1 - d/p, so that m+α=sm + \alpha = s.

    • If d/pZd/p \in \mathbb{Z} (boundary case):   m=kd/p1  \;m = k - d/p - 1\; and the embedding holds for every α(0,1)\alpha \in (0,1), but in general not for α=1\alpha = 1 (Lipschitz can fail).

    Controlling enough derivatives in LpL^p forces continuity, and even Hölder regularity, with the leftover budget ss split between integer smoothness (mm) and a fractional Hölder exponent (α\alpha).

Each clause is the intuition made precise. The subcritical exponent pp^* is the one forced by scaling and saturated by the uncertainty extremizers. The supercritical continuity statement is the 1D Poincaré phenomenon extended to general dd once enough derivatives are controlled to push pp^* past infinity. The critical case is the knife edge between them.

Proof 1 (Sketch)

This is a sketch, not a full proof. Every case is the 1D Poincaré calculation re-used in dd dimensions.

Recall the 1D identity u(x)=0xu(t)dtu(x) = \int_0^x u'(t)\,dt, which fed either Cauchy-Schwarz (to get Hölder continuity) or integration in xx (to get LpL^p control). In Rd\mathbb{R}^d we apply the same identity along each coordinate axis: for each direction ii,

u(x)    iudyi.|u(x)| \;\leq\; \int |\partial_i u|\,dy_i.

The two regimes then split as follows.

  • Supercritical (kp>dkp > d, Morrey). Apply the 1D identity along a single line segment joining xx and yy, and estimate with Hölder. The result, u(x)u(y)Cxy1d/puLp|u(x) - u(y)| \leq C\,|x-y|^{1-d/p}\,\|\nabla u\|_{L^p}, is verbatim the Poincaré Hölder estimate with the exponent adjusted for dimension.

  • Subcritical (kp<dkp < d, Gagliardo-Nirenberg-Sobolev). Multiply the dd axial bounds, take the 1/(d1)1/(d-1) power, and integrate. This is the same “integrate the pointwise bound” move from Poincaré, but done simultaneously in all dd directions, and it delivers uLd/(d1)CuL1\|u\|_{L^{d/(d-1)}} \leq C\,\|\nabla u\|_{L^1}. The case p>1p > 1 follows by applying this to uγ|u|^\gamma; iteration in kk raises the exponent by 1/d1/d each step.

  • Critical (kp=dkp = d). The formula predicts p=p^* = \infty, but LL^\infty narrowly fails (think loglog(1/x)\log\log(1/|x|)). Interpolating the subcritical estimates on either side recovers LqL^q for every finite qq.

Compact embeddings: Rellich-Kondrachov

The Sobolev embedding theorem tells us that a bounded set in Wk,pW^{k,p} is bounded in LpL^{p^*}. However, bounded in an infinite-dimensional space does not mean precompact (Remark 2). The remarkable fact is that if we ask for slightly less integrability (q<pq < p^* instead of q=pq = p^*), the embedding becomes compact: bounded sequences not only stay bounded but have convergent subsequences.

Theorem 3 (Rellich-Kondrachov compactness)

Let ΩRd\Omega \subset \mathbb{R}^d be bounded with Lipschitz boundary, and let 1p<1 \leq p < \infty, k1k \geq 1.

  1. Subcritical (kp<dkp < d). With 1/p=1/pk/d1/p^* = 1/p - k/d, the embedding Wk,p(Ω)Lq(Ω)W^{k,p}(\Omega) \hookrightarrow L^q(\Omega) is compact for every 1q<p1 \leq q < p^*. The endpoint q=pq = p^* is continuous but not compact.

  2. Critical (kp=dkp = d). The embedding Wk,p(Ω)Lq(Ω)W^{k,p}(\Omega) \hookrightarrow L^q(\Omega) is compact for every 1q<1 \leq q < \infty. (Continuous into BMO, but not into LL^\infty.)

  3. Supercritical (kp>dkp > d). Writing α=kd/p(0,1]\alpha = k - d/p \in (0, 1] (non-integer part), the embedding Wk,p(Ω)C0,β(Ω)W^{k,p}(\Omega) \hookrightarrow C^{0,\beta}(\overline{\Omega}) is compact for every 0β<α0 \leq \beta < \alpha (and in particular the embedding into C(Ω)C(\overline{\Omega}) is compact). The endpoint β=α\beta = \alpha is continuous but not compact.

The pattern is uniform across regimes: compactness holds strictly below the sharp embedding exponent, and fails at it. The obstruction at the endpoint is always the same scaling/concentration phenomenon — the critical embedding is scale-invariant, so bumps un(x)=λnαφ(λnx)u_n(x) = \lambda_n^\alpha \varphi(\lambda_n x) keep a fixed norm while escaping every compact set. Giving up an arbitrarily small amount of the exponent breaks the scale invariance and restores compactness.

Proof 2

We prove the key case H1(Ω)L2(Ω)H^1(\Omega) \hookrightarrow L^2(\Omega) using Fourier analysis; the general case follows the same logic.

What we are actually proving. You might expect a compactness proof to take a bounded sequence {un}\{u_n\} and extract a Cauchy subsequence by hand. We do not do that. Instead, we use the characterization

precompact in a complete metric space    totally bounded\text{precompact in a complete metric space} \;\Longleftrightarrow\; \text{totally bounded}

(Definition 1), where totally bounded means: for every ε>0\varepsilon > 0, the set is covered by finitely many ε\varepsilon-balls. So we never mention sequences. Instead we show that the H1H^1-unit ball BB, viewed inside L2L^2, is approximately finite-dimensional: at any resolution ε>0\varepsilon > 0, every element of BB is within ε\varepsilon of a finite set.

This is a concrete instance of the general Kolmogorov-Riesz-Fréchet criterion (Theorem 2): a bounded set in LpL^p is precompact iff it is equicontinuous under translation and tight. The gradient bound in the H1H^1-norm is precisely what supplies the equicontinuity condition, since τhuuL2huL2\|\tau_h u - u\|_{L^2} \leq |h|\,\|\nabla u\|_{L^2}, and the bounded domain supplies tightness for free. Seen this way, Rellich-Kondrachov is not a miracle: it is Kolmogorov-Riesz with the hypothesis (2) automatically verified by gradient control. The Fourier-cutoff proof below is one convenient way to package the argument.

Once total boundedness is in hand, subsequence extraction is automatic: any bounded sequence lands in BB, and a totally bounded set in a complete space has convergent subsequences for free (this is the ()(\Leftarrow) direction of the equivalence above). So the proof below is shorter and more structural than any direct subsequence argument would be, and it explains why compactness holds: because the derivative bound crushes the infinite tail of Fourier modes down to a size we can ignore, leaving only a finite-dimensional piece.

The plan in two lines:

  1. Split u=uN+u>Nu = u_{\leq N} + u_{>N} at Fourier cutoff NN.

  2. The tail u>Nu_{>N} is L2L^2-small uniformly in uBu \in B, and the head uNu_{\leq N} lives in a finite-dimensional subspace.

That is the whole argument. The rest is bookkeeping.

Let B={uH1(Ω):uH11}B = \{u \in H^1(\Omega) : \|u\|_{H^1} \leq 1\} and consider BB as a subset of L2(Ω)L^2(\Omega). Working on a bounded domain (or torus) with Fourier basis {ek}\{e_k\}, every uBu \in B has an expansion u=ku^keku = \sum_k \hat{u}_k e_k with

k(1+k2)u^k2=uH121.\sum_k (1 + |k|^2)\,|\hat{u}_k|^2 = \|u\|_{H^1}^2 \leq 1.

Split uu into low and high frequencies at a cutoff NN:

u=kNu^kekuN+k>Nu^keku>N.u = \underbrace{\sum_{|k| \leq N} \hat{u}_k\, e_k}_{u_{\leq N}} + \underbrace{\sum_{|k| > N} \hat{u}_k\, e_k}_{u_{>N}}.

The high-frequency tail is uniformly small. For every uBu \in B:

u>NL22=k>Nu^k211+N2k>N(1+k2)u^k211+N2.\|u_{>N}\|_{L^2}^2 = \sum_{|k|>N} |\hat{u}_k|^2 \leq \frac{1}{1 + N^2} \sum_{|k|>N} (1 + |k|^2)\,|\hat{u}_k|^2 \leq \frac{1}{1 + N^2}.

This bound is uniform over all uBu \in B: it depends only on NN, not on the particular function. Given any ε>0\varepsilon > 0, choose NN large enough that 1/(1+N2)<ε21/(1+N^2) < \varepsilon^2.

The low-frequency part is finite-dimensional. The projection uuNu \mapsto u_{\leq N} maps BB into the span of the finitely many modes {ek:kN}\{e_k : |k| \leq N\}, which is a finite-dimensional subspace of L2L^2. A bounded set in a finite-dimensional space is precompact (Bolzano-Weierstrass).

Combining: every element of BB is within ε\varepsilon (in L2L^2) of the precompact set {uN:uB}\{u_{\leq N} : u \in B\}. A set that can be approximated to arbitrary accuracy by precompact sets is itself precompact. So BB is precompact in L2L^2, i.e. the embedding is compact.

How each hypothesis defeats each pathology

The proof makes explicit how the two hypotheses of Rellich-Kondrachov match the two pathologies (P1), (P2) identified at the start of the chapter:

Both hypotheses are essential. On Rd\mathbb{R}^d, H1L2H^1 \hookrightarrow L^2 is not compact, because translation sequences defeat it. On a bounded domain without the derivative bound, the identity L2L2L^2 \hookrightarrow L^2 is not compact in infinite dimensions, because oscillations defeat it.

Rellich-Kondrachov is the Sobolev-space counterpart of the Arzelà-Ascoli theorem (Theorem 1). In Arzelà-Ascoli, equicontinuity prevents rapid spatial oscillation; here, the H1H^1 bound prevents rapid frequency oscillation. In both cases, a regularity condition suppresses one pathology and a bounded domain suppresses the other. Together they force approximate finite-dimensionality.

Connection to compact operators

In the language of the compact operators chapter (Definition 1), the embedding ι:H1(Ω)L2(Ω)\iota : H^1(\Omega) \hookrightarrow L^2(\Omega) is a compact operator: it maps the bounded H1H^1 unit ball to a precompact subset of L2L^2. This is why compact embeddings and compact operators are interchangeable language.

The payoff for PDE is that compact embeddings upgrade weak convergence to strong convergence, which is essential for passing to the limit through nonlinearities. See the applications chapter for the full development: the weak formulation via Lax-Milgram (Theorem 1) and the nonlinear PDE example (nonlinear-pde-tools) both rely on Rellich-Kondrachov as a critical step.