Optimization Algorithms (Opti)

\[\]

This section contains optimization algorithms classes which all derive from the abstract class Opti.

OptiADMM

class Opti.OptiADMM

Bases: Opti

Alternating Direction Method of Multipliers [1] algorithm which minimizes Cost of the form $$ C(\mathrm{x}) = F_0(\mathrm{x}) + \sum_{n=1}^N F_n(\mathrm{H_n x}) $$

Parameters:

F_0 – Cost object
F_n – cell of N Cost with an implementation of the prox() for each one
H_n – cell of N LinOp
rho_n – array of N positive scalars
maxiterCG – maximal number of inner conjugate gradient (CG) iterations (when required, default 20)
solver – a handle function taking parameters solver(z_n,rho_n,x0) (see the note below)
OutOpCG – OutputOpti object for CG (when used)
ItUpOutCG – ItUpOut parameter for CG (when used, default 0)

All attributes of parent class Opti are inherited.

Principle The algorithm aims at minimizing the Lagrangian formulation of the above problem: $$ \mathcal{L}(\mathrm{x,y_1…y_n,w_1…w_n}) = F_0(\mathrm{x}) + \sum_{n=1}^N \frac12\rho_n\Vert \mathrm{H_nx - y_n + w_n/\rho_n} \Vert^2 + F_n(\mathrm{y_n})$$ using an alternating minimization scheme [1].

Note The minimization of $\mathcal{L}$ over $\mathrm{x}$, $$ F_0(\mathrm{x}) + \sum_{n=1}^N \frac12\rho_n\Vert \mathrm{H_nx -z_n}\Vert^2, \quad \mathrm{z_n= y_n - w_n/\rho_n} $$ is performed either using the conjugate-gradient OptiConjGrad algorithm, a direct inversion or the given solver

If $F_0$ is empty or is a CostL2, then if the LinOp $\sum_{n=0}^N \mathrm{H_n}^*\mathrm{H_n}$ is not invertible the OptiConjGrad is used by default if no more efficient solver is provided. Here $\mathrm{H_0}$ is the LinOp associated to $F_0$.

Otherwise the solver is required.

Reference

[1] Boyd, Stephen, et al. “Distributed optimization and statistical learning via the alternating direction method of multipliers.” Foundations and Trends in Machine Learning, 2011.

Example ADMM=OptiADMM(F0,Fn,Hn,rho_n,solver)

Method Summary

initialize(this, x0): Reimplementation from Opti.

doIteration(this): Reimplementation from Opti. For details see [1].

OptiChambPock

class Opti.OptiChambPock

Bases: Opti

Chambolle-Pock optimization algorithm [1] which minimizes Cost of the form $$ C(\mathrm{x}) = F(\mathrm{Hx}) + G(\mathrm{x}) $$

Parameters:

F – a Cost with an implementation of the prox().
G – a Cost with an implementation of the prox().
H – a LinOp.
tau – parameter of the algorithm (default 1)
sig – parameter of the algorithm which is computed automatically if H.norm is different from -1.
var –
select the “bar” variable of the algorithm (see [1]):
- if 1 (default) then the primal variable $\bar{\mathrm{x}} = 2\mathrm{x}_n - \mathrm{x}_{n-1}$ is used
- if 2 then the dual variable $\bar{\mathrm{y}} = 2\mathrm{y}_n - \mathrm{y}_{n-1} $ is used
gam –
if non-empty, accelerated version (see [1]). Here, $G$ or $F^*$ is uniformly convex with $\nabla G^*$ or $\nabla F$ 1/gam-Lipschitz:
- If $G$ is uniformly convex then set the parameter var to 1.
- If $F^*$ is uniformly convex then set the parameter var to 2

All attributes of parent class Opti are inherited.

Note-1: In fact, $F$ needs only the prox of its fenchel transform prox_fench() (which is implemented as soon as $F$ has an implementation of the prox, see Cost).

Note-2:

To ensure convergence (see [1]), parameters sig and tau have to verify $$ \sigma \times \tau \times \Vert \mathrm{H} \Vert^2 < 1 $$ where $\Vert \mathrm{H}\Vert$ denotes the norm of the linear operator H.

When the accelerated version is used (i.e. parameter gam is non-empty), sig and tau will be updated at each iteration and the initial ones (given by user) have to verify $$ \sigma \times \tau \times \Vert \mathrm{H} \Vert^2 \leq 1 $$

Reference:

[1] Chambolle, Antonin, and Thomas Pock. “A first-order primal-dual algorithm for convex problems with applications to imaging.” Journal of Mathematical Imaging and Vision 40.1, pp 120-145 (2011).

Example CP=OptiChambPock(F,H,G)

OptiConjGrad

class Opti.OptiConjGrad

Bases: Opti

Conjugate gradient optimization algorithm which solves the linear system $\mathrm{Ax=b}$ by minimizing $$ C(\mathrm{x})= \frac12 \mathrm{x^TAx - b^Tx} $$

Parameters:

A – symmetric definite positive LinOp
b – right-hand term

All attributes of parent class Opti are inherited.

Example CG=OptiConjGrad(A,b,OutOp)

OptiDouglasRachford

class Opti.OptiDouglasRachford

Bases: Opti

Douglas Rachford splitting optimization algorithm which minimizes Cost of the form $$ C(\mathrm{x}) = F_1(\mathrm{x}) + F_2(\mathrm{L x}) $$

Parameters:

F_1 – a Cost with an implementation of the prox().
F_2 – a Cost with an implementation of the prox().
L – a LinOp such that $\mathrm{LL^T} = \nu \mathrm{I} $
gamma – $\in [0,+\inf[$
lambda – $\in ]0,2[$ the relaxation parmeter

All attributes of parent class Opti are inherited.

Example DR=OptiDouglasRachford(F1, F2, L, gamma, lambda)

OptiFBS

class Opti.OptiFBS

Bases: Opti

Forward-Backward Splitting optimization algorithm [1] which minimizes Cost of the form $$ C(\mathrm{x}) = F(\mathrm{x}) + G(\mathrm{x}) $$

Parameters:

F – a differentiable Cost (i.e. with an implementation of applyGrad()).
G – a Cost with an implementation of the applyProx().
gam – descent step
fista – boolean true if the accelerated version FISTA [3] is used (default false)
momRestart – boolean true if the moment restart strategy is used [4](default false)
updateGam – Rule for updating gamma (none : default, reduced : the parameter gam is decreased according to $\gamma / \sqrt{k} $, backtracking : backtracking rule following [3])
eta – parameter greater than 1 that is used with backtracking (see [3])

All attributes of parent class Opti are inherited.

Note: When the functional are convex and $F$ has a Lipschitz continuous gradient, convergence is ensured by taking $\gamma \in (0,2/L] $ where $L$ is the Lipschitz constant of $\nabla F$ (see [1]). When FISTA is used [3], $\gamma $ should be in $(0,1/L]$. For nonconvex functions [2] take $\gamma \in (0,1/L]$. If $L$ is known (i.e. F.lip different from -1), parameter $\gamma$ is automatically set to $1/L$.

References:

[1] P.L. Combettes and V.R. Wajs, “Signal recovery by proximal forward-backward splitting”, SIAM Journal on Multiscale Modeling & Simulation, vol 4, no. 4, pp 1168-1200, (2005).

[2] Hedy Attouch, Jerome Bolte and Benar Fux Svaiter “Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized gaussiedel methods.” Mathematical Programming, 137 (2013).

[3] Amir Beck and Marc Teboulle, “A Fast Iterative Shrinkage-Thresholding Algorithm for Linear inverse Problems”, SIAM Journal on Imaging Science, vol 2, no. 1, pp 182-202 (2009)

[4] Brendan O’donoghue and Emmanuel Candès. 2015. Adaptive Restart for Accelerated Gradient Schemes. Found. Comput. Math. 15, 3 (June 2015), 715-732.

Example FBS=OptiFBS(F,G)

OptiFGP

class Opti.OptiFGP

Bases: Opti

Fast Gradient Proximal which computes the TV proximity operator which minimizes Cost of the form $$ C(\mathrm{x}) = \frac12\|\mathrm{x} - \mathrm{y}\|^2_2 + \lambda \|\mathrm{x} \|_{TV} $$

Parameters:

F_0 – CostL2 object
TV – CostTV
bounds – bounds for set constraint
gam – descent step (default 1/8)
lambda – regularization parameter for TV

All attributes of parent class Opti are inherited.

References

[1] Beck, A., and Teboulle, M. (2009). Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Transactions on Image Processing, 18(11), 2419-2434.

Example FGP=OptiFGP(F0,TV,bounds)

OptiGradDsct

class Opti.OptiGradDsct

Bases: Opti

Gradient Descent optimization algorithm to minimize a differentiable Cost $C(\mathrm{x})$

Parameters:

C – a differentiable Cost (i.e. with an implementation of applyGrad()).
gam – descent step
nagd – boolean (default false) to activate the Nesterov accelerated gradient descent

All attributes of parent class Opti are inherited.

Note If the cost $C$ is gradient Lipschitz, convergence is ensured by taking $\gamma \in (0,2/L] $ where $L$ is the Lipschitz constant of $\nabla C$ (see [1]). The optimal choice is $\gamma = 1/L $ (see [1]). If $L$ is known (i.e. F.lip different from -1), parameter $\gamma$ is automatically set to $1/L$.

Reference

[1] Nesterov, Yurii. “Introductory lectures on convex programming.” Lecture Notes (1998): 119-120.

Example GD=OptiGradDsct(F)

OptiPrimalDualCondat

class Opti.OptiPrimalDualCondat

Bases: Opti

Primal-Dual algorithm proposed by L. Condat in [1] which minimizes Cost of the form $$ C(\mathrm{x})= F_0(\mathrm{x}) + G(\mathrm{x}) + \sum_n F_n(\mathrm{H_nx}) $$

Parameters:

F_0 – a differentiable Cost (i.e. with an implementation of grad()).
G – a Cost with an implementation of the prox().
F_n – cell of N Cost with an implementation of the prox() for each one
H_n – cell of N LinOp
tau – parameter of the algorithm (see the note below)
sig – parameter of the algorithm (see the note below)
rho – parameter of the algorithm (see the note below)

All attributes of parent class Opti are inherited.

Note:

When $F_0=0$, parameters sig and tau have to verify $$ \sigma \times \tau \Vert \sum_n \mathrm{H_n^*H_n} \Vert \leq 1 $$ and $\rho \in ]0,2[$, to ensure convergence (see [1, Theorem 5.3]).

Otherwise, when $F_0\neq 0$, parameters sig and tau have to verify $$ \frac{1}{\tau} - \sigma \times \Vert \sum_n \mathrm{H_n^*H_n} \Vert \geq \frac{\beta}{2} $$ where $\beta$ is the Lipschitz constant of $\nabla F$ and we need $\rho \in ]0,\delta[ $ with $$ \delta = 2 - \frac{\beta}{2}\times\left(\frac{1}{\tau} - \sigma \times \Vert \sum_n \mathrm{H_n^*H_n} \Vert\right)^{-1} \in [1,2[ $$ to ensure convergence (see [1, Theorem 5.1]).

Reference

[1] Laurent Condat, “A Primal-Dual Splitting Method for Convex Optimization Involving Lipchitzian, Proximable and Linear Composite Terms”, Journal of Optimization Theory and Applications, vol 158, no 2, pp 460-479 (2013).

Example A=OptiPrimalDualCondat(F0,G,Fn,Hn)

OptiRichLucy

class Opti.OptiRichLucy

Bases: Opti

Richardson-Lucy algorithm [1,2] which minimizes the KullbackLeibler divergence CostKullLeib (with TV regularization [3]). $$ C(\mathrm{x})= F(\mathrm{x}) + \lambda \Vert \mathrm{x} \Vert_{\mathrm{TV}} $$

Parameters:

F – CostKullLeib object or a CostComposition with a CostKullLeib and a LinOp
TV – boolean true if TV regularization used (default false)
lambda – regularization parameter (when TV used)
epsl – smoothing parameter to make TV differentiable at 0 (default $10^{-6}$)

All attributes of parent class Opti are inherited.

Note An interesting property of this algorithm is that it ensures the positivity of the solution from any positive initialization. However, when TV is used, the positivity of the iterates is not ensured anymore if $\lambda $ is too large. Hence, $\lambda $ needs to be carefully chosen.

References

[1] Lucy, Leon B. “An iterative technique for the rectification of observed distributions” The astronomical journal (1974)

[2] Richardson, William Hadley. “Bayesian-based iterative method of image restoration.” JOSA (1972): 55-59.

[3] N. Dey et al. “Richardson-Lucy Algorithm With Total Variation Regularization for 3D Confocal Microscope Deconvolution.” Microscopy research and technique (2006).

Example RL=OptiRichLucy(F,TV,lamb)

See also Opti, OutputOpti, Cost, CostKullLeib

Method Summary

initialize(this, x0): Reimplementation from Opti.

doIteration(this): Reimplementation from Opti. For details see [1-3].

OptiVMLMB

class Opti.OptiVMLMB

Bases: Opti

Variable Metric Limited Memory Bounded (VMLMB) from OptimPackLegacy [1]. This algorithm minimizes a cost $C(\mathrm{x})$ which is differentiable with bound constraints and/or preconditioning.

Parameters:

C – minimized cost
xmin – min bound (optional)
xmax – max bound (optional)

All attributes of parent class Opti are inherited.

Note This Optimizer has many other variables that are set by default to reasonable values. See the function m_vmlmb_first.m in the MatlabOptimPack folder for more details.

Reference

[1] Eric Thiebaut, “Optimization issues in blind deconvolution algorithms”, SPIE Conf. Astronomical Data Analysis II, 4847, 174-183 (2002). See OptimPackLegacy repository.

Example VMLMB=OptiVMLMB(C,xmin,xmax)

Method Summary

initialize(this, x0): Reimplementation from Opti.

OutputOpti

OutputOpti (Default)

class Opti.OutputOpti.OutputOpti

Bases: matlab.mixin.Copyable

OutputOpti class for algorithms displayings and savings

At each ItUpOut iterations of an optimization algorithm (see Opti generic class), the update method of an OutputOpti object will be executed in order to acheive user defined computations, e.g.,

compute cost / SNR

store current iterate / cost value

plot/display stuffs

The present generic class implements a basic update method that:

display the iteration number

computes & display the cost (if activated)

computes & display the SNR if ground truth is provided

Parameters:

name – name of the OutputOpti
computecost – boolean, if true the cost function will be computed
evolcost – array to save the evolution of the cost function
saveXopt – boolean (default false) to save the evolution of the optimized variable xopt.
evolxopt – cell saving the optimization variable xopt
iterVerb – message will be displayed every iterVerb iterations (must be a multiple of the ItUpOut parameter of classes Opti)
costIndex – select a specific cost function among a sum in the case where the optimized cost function is a sum of cost functions

Example OutOpti=OutputOpti(computecost,iterVerb,costIndex)

Important The update method should have an unique imput that is the Opti object in order to be generic for all Optimization routines. Hence the update method has access (in reading mode) to all the properties of Opti objects.

OutputOptiConjGrad

class Opti.OutputOpti.OutputOptiConjGrad

Bases: Opti.OutputOpti.OutputOptiSNR

OutputOptiConjGrad class displayings and savings dedicated to for OptiConjGrad

The conjugate gradient algorithm minimizes the function $$ C(\mathrm{x})= \frac12 \mathrm{x^TAx - b^Tx} $$ However in many cases, it is often used to minimize: $$ F(\mathrm{x})= \frac12 \|H x - y\|^2_W $$ by setting: $$\mathrm{A} = \mathrm{H^T W H} \quad \text{and}\quad \mathrm{b = H^T W y}$$ An OutputOptiConjGrad object compute the cost F instead of the cost C.

Parameters:

computecost – boolean, if true the cost function will be computed
xtrue – ground truth to compute the error with the solution (if provided)
iterVerb – message will be displayed every iterVerb iterations (must be a multiple of the ItUpOut parameter of classes Opti)
ytWy – weighted norm of $y$ : $ \mathrm{ytWy} = \mathrm{ y^T\,W\,y}$

TestCvg

TestCvg (Default)

class Opti.TestCvg.TestCvg

Bases: matlab.mixin.Copyable

TestCvg class monitor convergence criterion during optimization

At each iterations of an optimization algorithm (see Opti generic class), the testConvergence() method of an TestCvg object will be executed in order to acheive user defined computations

Example CvOpti=TestCvg()

Important The testConvergence() method should have an unique imput that is the Opti object in order to be generic for all Optimization routines. Hence the update method has access (in reading mode) to all the properties of Opti objects.

TestCvgCombine

class Opti.TestCvg.TestCvgCombine

Bases: Opti.TestCvg.TestCvg

TestCvgCombine: Combine several TestCvg objects.

Examples

CvOpti = TestCvgCombine(A, B ,C); where A B and C are of TestCvg class

CvOpti = TestCvgCombine(‘CostRelative’,0.000001, ‘CostAbsolute’,10); for simple test

See also TestCvg

Method Summary

testConvergence(this, opti): Reimplemented from parent class TestCvg.

TestCvgCostAbsolute

class Opti.TestCvg.TestCvgCostAbsolute

Bases: Opti.TestCvg.TestCvg

TestCvgCostAbsolute stops the optimization when the cost function is below the value COSTABSOLUTETOL

Parameters:

costAbsoluteTol – absolute tolerance on cost function
costIndex – select a specific cost function among a sum in the case where the optimized cost function is a sum of cost functions

Example CvOpti=TestCvgCostAbsolute(costAbsoluteTol, costIndex )

See also TestCvg

Method Summary

testConvergence(this, opti)

Tests algorithm convergence

Returns:: boolean true if $ C(\mathrm{x^k}) < \mathrm{costAbsoluteTol}$

TestCvgCostRelative

class Opti.TestCvg.TestCvgCostRelative

Bases: Opti.TestCvg.TestCvg

TestCvgCostRelative stops the optimization when the cost function is below the value COSTRELATIVETOL

Parameters:

costRelativeTol – relative tolerance on cost function
costIndex – select a specific cost function among a sum in the case where the optimized cost function is a sum of cost functions

Example CvOpti=TestCvgCostRelative(costRelativeTol, costIndex )

See also TestCvg

Method Summary

testConvergence(this, opti)

Tests algorithm convergence from the relative difference between two successive value of the cost function

Returns:: boolean true if $$ \frac{\left| C(\mathrm{x}^{k}) - C(\mathrm{x}^{k-1})\right|}{\left|C(\mathrm{x}^{k-1})\right|} < \mathrm{costRelativeTol}$$

TestCvgStepRelative

class Opti.TestCvg.TestCvgStepRelative

Bases: Opti.TestCvg.TestCvg

TestCvgStepRelative stops the optimization when the step is below the value STEPRELATIVETOL

Parameters:: stepRelativeTol – relative tolerance on step

Example CvOpti=TestCvgStepRelative(stepRelativeTol )

See also TestCvg

Method Summary

testConvergence(this, opti)

Tests algorithm convergence from the relative difference between two successive value of the step function

Returns:: boolean true if $$ \frac{\| \mathrm{x}^{k} - \mathrm{x}^{k-1}\|}{\|\mathrm{x}^{k-1}\|} < \text{stepRelativeTol}.$$

TestCvgMaxSnr

class Opti.TestCvg.TestCvgMaxSnr

Bases: Opti.TestCvg.TestCvg

TestCvgMaxSnr stops the optimization when the SNR is decreasing

Parameters:: ref – reference signal

Example CvOpti=TestCvgMaxSnr(ref )

See also TestCvg

Method Summary

testConvergence(this, opti)

Tests algorithm convergence using the SNR with

Returns:: boolean true if $$ 20\log\left(\frac{\| \mathrm{ref}\|}{\| \mathrm{x}^{k} - \mathrm{ref}\|}\right)< 20\log\left(\frac{\| \mathrm{ref}\|}{\| \mathrm{x}^{k-1} - \mathrm{ref}\|}\right)$$

TestCvgADMM

class Opti.TestCvg.TestCvgADMM

Bases: Opti.TestCvg.TestCvg

Test convergence by monitoring the primal and dual rediduals in ADMM as described in [1].

Parameters:

eps_abs – absolute tolerance (default 0, see [1])
eps_rel – relative tolerance (default 1e-3 see [1])

Reference

[1] Boyd, Stephen, et al. “Distributed optimization and statistical learning via the alternating direction method of multipliers.” Foundations and Trends in Machine Learning, 2011.

Warning the termination criterion described in [1] requires to apply the adjoint of Hn at every iteration, which may be costly

Example CvOpti=TestCvgADMM(eps_abs,eps_rel)

See also TestCvg

Method Summary

testConvergence(this, opti): Reimplemented from parent class TestCvg.