Computing the Jacobian costs evaluations per iteration. Quasi-Newton methods build an approximation and update it cheaply using information from the iteration. The trade-off: superlinear convergence instead of quadratic, but much cheaper iterations.
Each Newton iteration requires evaluating partial derivatives and an LU factorization. For large , or when derivatives are expensive, this is prohibitive. Quasi-Newton methods replace the exact Jacobian with an approximation that is updated cheaply ( per step) using information from the iteration itself.
From the Secant Method to Systems¶
In one dimension, the secant method avoids computing by approximating it from two previous iterates:
This replaces the tangent line (Newton) with a secant line through the two most recent points. The secant method converges with order (the golden ratio), which is between linear and quadratic.
For a system , we want to generalize this idea: build an approximation to the Jacobian using the step and the corresponding change in , . The natural requirement is that should reproduce this finite difference exactly.
Definition 1 (The Secant Condition)
This says: the approximation should be exact in the direction we just moved. But the secant condition gives only equations for unknowns (the entries of ), so infinitely many matrices satisfy it. We need an additional criterion to select one.
Broyden’s Method¶
The simplest and most natural choice: among all matrices satisfying the secant condition, pick the one that changes as little as possible. This leads to Broyden’s method (1965).
Definition 2 (Broyden’s Update)
Given the current approximation , the updated approximation is:
Proof 1 (Derivation)
We solve the constrained minimization problem:
The constraint requires . The minimum Frobenius norm perturbation satisfying a single linear constraint is a rank-1 matrix:
To verify: multiply both sides by on the right to recover the constraint. The factor is a projection that ensures the perturbation acts only in the direction , leaving unchanged in all orthogonal directions.
The update is a rank-1 perturbation of , which means:
agrees with on all directions orthogonal to .
is corrected only in the direction we just moved, where we have new information.
The Algorithm¶
Algorithm 1 (Broyden’s Method)
Input: , initial guess , initial , tolerance
Output: Approximate root
for :
Solve
if : return
Efficient Implementation via Sherman-Morrison¶
Each iteration requires solving , which naively costs for LU factorization. Since differs from by a rank-1 matrix, we can instead maintain directly and update it using the Sherman-Morrison formula:
Proposition 1 (Sherman-Morrison Formula)
If is invertible and is invertible, then:
Applied to Broyden’s update, the inverse approximation satisfies:
With this approach, each iteration costs (matrix-vector products for the update, and is a matrix-vector multiply instead of a linear solve).
Convergence¶
Theorem 1 (Superlinear Convergence of Broyden’s Method)
Suppose , is nonsingular, and is Lipschitz continuous near . If is sufficiently close to and is sufficiently close to , then Broyden’s method converges superlinearly:
Proof 2
Write and . The Newton step with exact Jacobian would give (quadratic). With Broyden’s approximation, the error satisfies:
Adding and subtracting :
The first term is from the standard Newton analysis. The second term is bounded by (since ).
The key result (Dennis and Moré, 1974) is that the Broyden update satisfies:
The error in does not grow in the direction , and the term shrinks as we converge. A careful analysis (the Dennis-Moré characterization theorem) shows that the Jacobian approximation error satisfies:
This means becomes increasingly accurate in the directions that matter (the step directions), even though may not converge to zero overall. Combined with the Newton residual, this gives , i.e., superlinear convergence.
The convergence is superlinear but not quadratic because is only an approximation to the Jacobian. Newton’s method uses the exact Jacobian at each step, giving error. Broyden’s method adds a Jacobian approximation error that is (vanishing relative to the error) but not . The approximation improves in the step directions as the iteration progresses, which is enough for superlinear but not quadratic convergence.
Broyden’s “Good” and “Bad” Methods¶
The update above is called Broyden’s first method (or “good” Broyden). It updates to match the secant condition .
Broyden’s second method (or “bad” Broyden) instead updates the inverse to satisfy the inverse secant condition , using the analogous minimum-change principle:
The names are historical and misleading: neither method is universally better. Broyden’s first method tends to work better when is well conditioned; the second method can be preferable when is ill conditioned, since it avoids inverting the approximation.
Error-Oriented Quasi-Newton (QNERR)¶
Broyden’s method is only locally convergent: it has no mechanism to detect or recover from a bad approximation. In Deuflhard’s framework (see Globalization), the natural combination of quasi-Newton with globalization is QNERR: the error-oriented quasi-Newton method.
In NLEQ_ERR, once the contraction factor drops below a threshold (typically 0.5), the iteration is converging well and recomputing the Jacobian is wasteful. QNERR freezes the Jacobian and applies rank-1 corrections using the iteration history, while continuing to monitor . If the contraction degrades (), QNERR hands control back to NLEQ_ERR, which recomputes the Jacobian.
This gives a hybrid strategy:
Phase 1 (NLEQ_ERR): full Newton with damping until contraction is established ().
Phase 2 (QNERR): freeze the Jacobian, apply rank-1 corrections, monitor . If contraction degrades, switch back to Phase 1.
Unlike standalone Broyden, QNERR always starts from the true Jacobian (computed in the Newton phase) and has a built-in safety net through monitoring. It also inherits the affine invariance of NLEQ_ERR, since it monitors rather than .