optimize._lsq.trf

Trust Region Reflective algorithm for least-squares optimization.

The algorithm is based on ideas from paper [STIR]. The main idea is to account for presence of the bounds by appropriate scaling of the variables (or equivalently changing a trust-region shape). Let’s introduce a vector v:

ub[i] - x[i], if g[i] < 0 and ub[i] < np.inf
v[i] = | x[i] - lb[i], if g[i] > 0 and lb[i] > -np.inf
1, otherwise

where g is the gradient of a cost function and lb, ub are the bounds. Its components are distances to the bounds at which the anti-gradient points (if this distance is finite). Define a scaling matrix D = diag(v**0.5). First-order optimality conditions can be stated as

D^2 g(x) = 0.

Meaning that components of the gradient should be zero for strictly interior variables, and components must point inside the feasible region for variables on the bound.

Now consider this system of equations as a new optimization problem. If the point x is strictly interior (not on the bound) then the left-hand side is differentiable and the Newton step for it satisfies

(D^2 H + diag(g) Jv) p = -D^2 g

where H is the Hessian matrix (or its J^T J approximation in least squares), Jv is the Jacobian matrix of v with components -1, 1 or 0, such that all elements of matrix C = diag(g) Jv are non-negative. Introduce the change of the variables x = D x_h (_h would be “hat” in LaTeX). In the new variables we have a Newton step satisfying

B_h p_h = -g_h,

where B_h = D H D + C, g_h = D g. In least squares B_h = J_h^T J_h, where J_h = J D. Note that J_h and g_h are proper Jacobian and gradient with respect to “hat” variables. To guarantee global convergence we formulate a trust-region problem based on the Newton step in the new variables:

0.5 * p_h^T B_h p + g_h^T p_h -> min, ||p_h|| <= Delta

In the original space B = H + D^{-1} C D^{-1}, and the equivalent trust-region problem is

0.5 * p^T B p + g^T p -> min, ||D^{-1} p|| <= Delta

Here the meaning of the matrix D becomes more clear: it alters the shape of a trust-region, such that large steps towards the bounds are not allowed. In the implementation the trust-region problem is solved in “hat” space, but handling of the bounds is done in the original space (see below and read the code).

The introduction of the matrix D doesn’t allow to ignore bounds, the algorithm must keep iterates strictly feasible (to satisfy aforementioned differentiability), the parameter theta controls step back from the boundary (see the code for details).

The algorithm does another important trick. If the trust-region solution doesn’t fit into the bounds, then a reflected (from a firstly encountered bound) search direction is considered. For motivation and analysis refer to [STIR] paper (and other papers of the authors). In practice it doesn’t need a lot of justifications, the algorithm simply chooses the best step among three: a constrained trust-region step, a reflected step and a constrained Cauchy step (a minimizer along -g_h in “hat” space, or -D^2 g in the original space).

Another feature is that a trust-region radius control strategy is modified to account for appearance of the diagonal C matrix (called diag_h in the code).

Note, that all described peculiarities are completely gone as we consider problems without bounds (the algorithm becomes a standard trust-region type algorithm very similar to ones implemented in MINPACK).

The implementation supports two methods of solving the trust-region problem. The first, called ‘exact’, applies SVD on Jacobian and then solves the problem very accurately using the algorithm described in [JJMore]. It is not applicable to large problem. The second, called ‘lsmr’, uses the 2-D subspace approach (sometimes called “indefinite dogleg”), where the problem is solved in a subspace spanned by the gradient and the approximate Gauss-Newton step found by scipy.sparse.linalg.lsmr. A 2-D trust-region problem is reformulated as a 4-th order algebraic equation and solved very accurately by numpy.roots. The subspace approach allows to solve very large problems (up to couple of millions of residuals on a regular PC), provided the Jacobian matrix is sufficiently sparse.

References

[STIR](1, 2) Branch, M.A., T.F. Coleman, and Y. Li, “A Subspace, Interior, and Conjugate Gradient Method for Large-Scale Bound-Constrained Minimization Problems,” SIAM Journal on Scientific Computing, Vol. 21, Number 1, pp 1-23, 1999.
[JJMore]More, J. J., “The Levenberg-Marquardt Algorithm: Implementation and Theory,” Numerical Analysis, ed. G. A. Watson, Lecture

Module Contents

Functions

trf(fun,jac,x0,f0,J0,lb,ub,ftol,xtol,gtol,max_nfev,x_scale,loss_function,tr_solver,tr_options,verbose)
select_step(x,J_h,diag_h,g_h,p,p_h,d,Delta,lb,ub,theta) Select the best step according to Trust Region Reflective algorithm.
trf_bounds(fun,jac,x0,f0,J0,lb,ub,ftol,xtol,gtol,max_nfev,x_scale,loss_function,tr_solver,tr_options,verbose)
trf_no_bounds(fun,jac,x0,f0,J0,ftol,xtol,gtol,max_nfev,x_scale,loss_function,tr_solver,tr_options,verbose)
trf(fun, jac, x0, f0, J0, lb, ub, ftol, xtol, gtol, max_nfev, x_scale, loss_function, tr_solver, tr_options, verbose)
select_step(x, J_h, diag_h, g_h, p, p_h, d, Delta, lb, ub, theta)

Select the best step according to Trust Region Reflective algorithm.

trf_bounds(fun, jac, x0, f0, J0, lb, ub, ftol, xtol, gtol, max_nfev, x_scale, loss_function, tr_solver, tr_options, verbose)
trf_no_bounds(fun, jac, x0, f0, J0, ftol, xtol, gtol, max_nfev, x_scale, loss_function, tr_solver, tr_options, verbose)