Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Big Idea

Computing the Jacobian costs O(n2)\mathcal{O}(n^2) evaluations per iteration. Quasi-Newton methods build an approximation BkDF(xk)B_k \approx D\mathbf{F}(\mathbf{x}_k) and update it cheaply using information from the iteration. The trade-off: superlinear convergence instead of quadratic, but much cheaper iterations.

Each Newton iteration requires evaluating n2n^2 partial derivatives and an O(n3)\mathcal{O}(n^3) LU factorization. For large nn, or when derivatives are expensive, this is prohibitive. Quasi-Newton methods replace the exact Jacobian with an approximation BkDF(xk)B_k \approx D\mathbf{F}(\mathbf{x}_k) that is updated cheaply (O(n2)\mathcal{O}(n^2) per step) using information from the iteration itself.

From the Secant Method to Systems

In one dimension, the secant method avoids computing ff' by approximating it from two previous iterates:

f(xk)f(xk)f(xk1)xkxk1f'(x_k) \approx \frac{f(x_k) - f(x_{k-1})}{x_k - x_{k-1}}

This replaces the tangent line (Newton) with a secant line through the two most recent points. The secant method converges with order p1.618p \approx 1.618 (the golden ratio), which is between linear and quadratic.

For a system F:RnRn\mathbf{F}: \mathbb{R}^n \to \mathbb{R}^n, we want to generalize this idea: build an approximation Bk+1B_{k+1} to the Jacobian using the step sk=xk+1xk\mathbf{s}_k = \mathbf{x}_{k+1} - \mathbf{x}_k and the corresponding change in F\mathbf{F}, yk=F(xk+1)F(xk)\mathbf{y}_k = \mathbf{F}(\mathbf{x}_{k+1}) - \mathbf{F}(\mathbf{x}_k). The natural requirement is that Bk+1B_{k+1} should reproduce this finite difference exactly.

Definition 1 (The Secant Condition)

The approximation Bk+1B_{k+1} must satisfy:

Bk+1sk=ykB_{k+1}\mathbf{s}_k = \mathbf{y}_k

where sk=xk+1xk\mathbf{s}_k = \mathbf{x}_{k+1} - \mathbf{x}_k and yk=F(xk+1)F(xk)\mathbf{y}_k = \mathbf{F}(\mathbf{x}_{k+1}) - \mathbf{F}(\mathbf{x}_k).

This says: the approximation should be exact in the direction we just moved. But the secant condition gives only nn equations for n2n^2 unknowns (the entries of Bk+1B_{k+1}), so infinitely many matrices satisfy it. We need an additional criterion to select one.

Broyden’s Method

The simplest and most natural choice: among all matrices satisfying the secant condition, pick the one that changes BkB_k as little as possible. This leads to Broyden’s method (1965).

Definition 2 (Broyden’s Update)

Given the current approximation BkB_k, the updated approximation is:

Bk+1=Bk+(ykBksk)skTskTskB_{k+1} = B_k + \frac{(\mathbf{y}_k - B_k\mathbf{s}_k)\mathbf{s}_k^T}{\mathbf{s}_k^T\mathbf{s}_k}

Proof 1 (Derivation)

We solve the constrained minimization problem:

minBk+1Bk+1BkFsubject toBk+1sk=yk\min_{B_{k+1}} \|B_{k+1} - B_k\|_F \quad \text{subject to} \quad B_{k+1}\mathbf{s}_k = \mathbf{y}_k

The constraint requires (Bk+1Bk)sk=ykBksk(B_{k+1} - B_k)\mathbf{s}_k = \mathbf{y}_k - B_k\mathbf{s}_k. The minimum Frobenius norm perturbation satisfying a single linear constraint is a rank-1 matrix:

Bk+1Bk=(ykBksk)skTskTskB_{k+1} - B_k = \frac{(\mathbf{y}_k - B_k\mathbf{s}_k)\mathbf{s}_k^T}{\mathbf{s}_k^T\mathbf{s}_k}

To verify: multiply both sides by sk\mathbf{s}_k on the right to recover the constraint. The factor skT/(skTsk)\mathbf{s}_k^T/(\mathbf{s}_k^T\mathbf{s}_k) is a projection that ensures the perturbation acts only in the direction sk\mathbf{s}_k, leaving BkB_k unchanged in all orthogonal directions.

The update is a rank-1 perturbation of BkB_k, which means:

The Algorithm

Algorithm 1 (Broyden’s Method)

Input: F\mathbf{F}, initial guess x0\mathbf{x}_0, initial B0DF(x0)B_0 \approx D\mathbf{F}(\mathbf{x}_0), tolerance ε\varepsilon

Output: Approximate root x\mathbf{x}

  1. for k=0,1,2,k = 0, 1, 2, \ldots:

  2. \qquad Solve BkΔx=F(xk)B_k \Delta\mathbf{x} = -\mathbf{F}(\mathbf{x}_k)

  3. \qquad xk+1xk+Δx\mathbf{x}_{k+1} \gets \mathbf{x}_k + \Delta\mathbf{x}

  4. \qquad if Δx<ε\|\Delta\mathbf{x}\| < \varepsilon: return xk+1\mathbf{x}_{k+1}

  5. \qquad skΔx\mathbf{s}_k \gets \Delta\mathbf{x}

  6. \qquad ykF(xk+1)F(xk)\mathbf{y}_k \gets \mathbf{F}(\mathbf{x}_{k+1}) - \mathbf{F}(\mathbf{x}_k)

  7. \qquad Bk+1Bk+(ykBksk)skTskTskB_{k+1} \gets B_k + \frac{(\mathbf{y}_k - B_k\mathbf{s}_k)\mathbf{s}_k^T}{\mathbf{s}_k^T\mathbf{s}_k}

Efficient Implementation via Sherman-Morrison

Each iteration requires solving BkΔx=FB_k \Delta\mathbf{x} = -\mathbf{F}, which naively costs O(n3)\mathcal{O}(n^3) for LU factorization. Since Bk+1B_{k+1} differs from BkB_k by a rank-1 matrix, we can instead maintain Hk=Bk1H_k = B_k^{-1} directly and update it using the Sherman-Morrison formula:

Proposition 1 (Sherman-Morrison Formula)

If AA is invertible and A+uvTA + \mathbf{u}\mathbf{v}^T is invertible, then:

(A+uvT)1=A1A1uvTA11+vTA1u(A + \mathbf{u}\mathbf{v}^T)^{-1} = A^{-1} - \frac{A^{-1}\mathbf{u}\mathbf{v}^T A^{-1}}{1 + \mathbf{v}^T A^{-1}\mathbf{u}}

Applied to Broyden’s update, the inverse approximation Hk=Bk1H_k = B_k^{-1} satisfies:

Hk+1=Hk+(skHkyk)skTHkskTHkykH_{k+1} = H_k + \frac{(\mathbf{s}_k - H_k\mathbf{y}_k)\mathbf{s}_k^T H_k}{\mathbf{s}_k^T H_k \mathbf{y}_k}

With this approach, each iteration costs O(n2)\mathcal{O}(n^2) (matrix-vector products for the update, and Δx=HkF\Delta\mathbf{x} = -H_k\mathbf{F} is a matrix-vector multiply instead of a linear solve).

Convergence

Theorem 1 (Superlinear Convergence of Broyden’s Method)

Suppose F(x)=0\mathbf{F}(\mathbf{x}^*) = \mathbf{0}, DF(x)D\mathbf{F}(\mathbf{x}^*) is nonsingular, and DFD\mathbf{F} is Lipschitz continuous near x\mathbf{x}^*. If x0\mathbf{x}_0 is sufficiently close to x\mathbf{x}^* and B0B_0 is sufficiently close to DF(x)D\mathbf{F}(\mathbf{x}^*), then Broyden’s method converges superlinearly:

limkxk+1xxkx=0\lim_{k \to \infty} \frac{\|\mathbf{x}_{k+1} - \mathbf{x}^*\|}{\|\mathbf{x}_k - \mathbf{x}^*\|} = 0

Proof 2

Write ek=xkx\mathbf{e}_k = \mathbf{x}_k - \mathbf{x}^* and Ek=BkDF(x)E_k = B_k - D\mathbf{F}(\mathbf{x}^*). The Newton step with exact Jacobian would give ek+1Newton=O(ek2)\mathbf{e}_{k+1}^{\text{Newton}} = \mathcal{O}(\|\mathbf{e}_k\|^2) (quadratic). With Broyden’s approximation, the error satisfies:

ek+1=ek+Δxk=ekBk1F(xk)\mathbf{e}_{k+1} = \mathbf{e}_k + \Delta\mathbf{x}_k = \mathbf{e}_k - B_k^{-1}\mathbf{F}(\mathbf{x}_k)

Adding and subtracting DF(x)1F(xk)D\mathbf{F}(\mathbf{x}^*)^{-1}\mathbf{F}(\mathbf{x}_k):

ek+1=ekDF(x)1F(xk)O(ek2) (Newton residual)+(DF(x)1Bk1)F(xk)Jacobian approximation error\mathbf{e}_{k+1} = \underbrace{\mathbf{e}_k - D\mathbf{F}(\mathbf{x}^*)^{-1}\mathbf{F}(\mathbf{x}_k)}_{\mathcal{O}(\|\mathbf{e}_k\|^2) \text{ (Newton residual)}} + \underbrace{(D\mathbf{F}(\mathbf{x}^*)^{-1} - B_k^{-1})\mathbf{F}(\mathbf{x}_k)}_{\text{Jacobian approximation error}}

The first term is O(ek2)\mathcal{O}(\|\mathbf{e}_k\|^2) from the standard Newton analysis. The second term is bounded by DF(x)1Ekek\|D\mathbf{F}(\mathbf{x}^*)^{-1}\| \cdot \|E_k\| \cdot \|\mathbf{e}_k\| (since F(xk)=O(ek)\|\mathbf{F}(\mathbf{x}_k)\| = \mathcal{O}(\|\mathbf{e}_k\|)).

The key result (Dennis and Moré, 1974) is that the Broyden update satisfies:

Ek+1skEksk+O(ek2)\|E_{k+1}\mathbf{s}_k\| \leq \|E_k\mathbf{s}_k\| + \mathcal{O}(\|\mathbf{e}_k\|^2)

The error in BkB_k does not grow in the direction sk\mathbf{s}_k, and the O(ek2)\mathcal{O}(\|\mathbf{e}_k\|^2) term shrinks as we converge. A careful analysis (the Dennis-Moré characterization theorem) shows that the Jacobian approximation error satisfies:

Eksksk0\frac{\|E_k \mathbf{s}_k\|}{\|\mathbf{s}_k\|} \to 0

This means BkB_k becomes increasingly accurate in the directions that matter (the step directions), even though Ek\|E_k\| may not converge to zero overall. Combined with the O(ek2)\mathcal{O}(\|\mathbf{e}_k\|^2) Newton residual, this gives ek+1/ek0\|\mathbf{e}_{k+1}\|/\|\mathbf{e}_k\| \to 0, i.e., superlinear convergence.

The convergence is superlinear but not quadratic because BkB_k is only an approximation to the Jacobian. Newton’s method uses the exact Jacobian at each step, giving O(ek2)\mathcal{O}(\|\mathbf{e}_k\|^2) error. Broyden’s method adds a Jacobian approximation error that is o(ek)o(\|\mathbf{e}_k\|) (vanishing relative to the error) but not O(ek2)\mathcal{O}(\|\mathbf{e}_k\|^2). The approximation improves in the step directions as the iteration progresses, which is enough for superlinear but not quadratic convergence.

Broyden’s “Good” and “Bad” Methods

The update above is called Broyden’s first method (or “good” Broyden). It updates BkB_k to match the secant condition Bk+1sk=ykB_{k+1}\mathbf{s}_k = \mathbf{y}_k.

Broyden’s second method (or “bad” Broyden) instead updates the inverse Hk=Bk1H_k = B_k^{-1} to satisfy the inverse secant condition Hk+1yk=skH_{k+1}\mathbf{y}_k = \mathbf{s}_k, using the analogous minimum-change principle:

Hk+1=Hk+(skHkyk)ykTykTykH_{k+1} = H_k + \frac{(\mathbf{s}_k - H_k\mathbf{y}_k)\mathbf{y}_k^T}{\mathbf{y}_k^T\mathbf{y}_k}

The names are historical and misleading: neither method is universally better. Broyden’s first method tends to work better when DFD\mathbf{F} is well conditioned; the second method can be preferable when DFD\mathbf{F} is ill conditioned, since it avoids inverting the approximation.

Error-Oriented Quasi-Newton (QNERR)

Broyden’s method is only locally convergent: it has no mechanism to detect or recover from a bad approximation. In Deuflhard’s framework (see Globalization), the natural combination of quasi-Newton with globalization is QNERR: the error-oriented quasi-Newton method.

In NLEQ_ERR, once the contraction factor θk\theta_k drops below a threshold θmax\theta_{\max} (typically 0.5), the iteration is converging well and recomputing the Jacobian is wasteful. QNERR freezes the Jacobian and applies rank-1 corrections using the iteration history, while continuing to monitor θk\theta_k. If the contraction degrades (θk>θmax\theta_k > \theta_{\max}), QNERR hands control back to NLEQ_ERR, which recomputes the Jacobian.

This gives a hybrid strategy:

Unlike standalone Broyden, QNERR always starts from the true Jacobian (computed in the Newton phase) and has a built-in safety net through θk\theta_k monitoring. It also inherits the affine invariance of NLEQ_ERR, since it monitors Δx\|\Delta\mathbf{x}\| rather than F\|\mathbf{F}\|.