Operator
πŸ€“

Operator

Tags
Operator
Optimization
Published
October 17, 2023
Author
Yonghai Gong

Proximal Algorithms

Proximal operator

  • Proximal operator of is
notion image
with parameter .
  • may be nonsmooth, have embedded constraints, …
  • evaluating involves solving a convex optimization problem
  • can evaluate via standard methods like BFGS, but very often has an analytical solution or simple specialized linear-time algorithm

Generalized projection

  • proximal operator of an indicator function of a convex set is projection:
notion image
many properties carry over
  • if , then
notion image
  • if is block separable, so , then
notion image
  • example: if , then
notion image
a simple element wise operation called soft thresholding

Fixed points

  • the point minimized if and only if is a fixed point:
notion image
  • provides a link between proximal operators and fixed point theory
  • many proximal algorithms can be viewed as methods for finding fixed points of appropriate operators

Moreau-Yosida regularization

  • infimal convolution of and , denote , is defined as
notion image
  • Moreau envelope or Moreau-Yosida regularization of is
notion image
  • a smoothed or regularized form of :
    • always has full domain
    • always continuously differentiable
    • has the same minimizes as
  • can minimized instead of , though could be hard to evaluate
  • motivation: can show that
notion image
  • Moreau envelope obtains a smooth approximation via:
    • taking conjugate
    • regularizing to get a strongly convex function
    • raking conjugate again
  • example: Moreau envelope of is the Huber function
notion image

Modified gradient step

  • many relationships between proximal operators and gradient steps
  • proximal operator is gradient step for Moreau envelope:
notion image
  • for small , converges to gradient step in :
notion image
  • parameter can be interpreted as a step size, though proximal methods will generally work even for large step sizes, unlike gradient method

Resolvent of subdifferential operator

  • if , then
notion image
  • can rewrite as
notion image
  • must be careful to interpret and expressions using it as relations
  • mapping known as resolvent of operator

Moreau decompotition

  • following relation always holds:
notion image
  • main link between proximal operators and duality
  • a generalization of orthogonal decomposition induced by subspace :
notion image
follows from Moreau decomposition and

Norms and norm balls

  • in general: if and is unit ball of dual norm, then
notion image
by Moreau decomposition
  • if and in the unit ball, then
notion image
sometimes called 'block soft thresholding' operator
  • if and in the unit ball, then
notion image
let us derive (element wise) soft thresholding
notion image
  • if and in the unit ball, simple algrithms available

Proximal minimization

  • the proximal minimization or proximal point algorithm is
notion image
  • the simplest proximal method, can be interpreted as
    • gradient method applied to rather than
    • simple iteration for finding a fixed point of
  • if , reduces to iterative refinement for
  • works for any fixed , or for non-summable

Operator splitting

  • the most useful proximal methods use the idea of operator splitting
  • these algorithms minimize only using or
  • useful when and each have useful structure separately
  • very simple historical example: alternating projections to find

Proximal gradient method

notion image
is smooth, is closed proper convex
  • method:
notion image
  • converges with rate when is Lipschitz continuous with constant and step sizes are
  • special case: projected gradient method (take )
  • if is not known (usually the case), can use the following line search
given , and parameter . Let . repeat 1. Let 2. break if 3.Update return
typical value of is 1/2, and
notion image

Interpretations

  • is the solution to
notion image
trade of between minimizing and being close to gradient step for
  • majorization-minimization method for :
    • keep minimizing convex upper bound to objective tight at the previous iterate
    • here, use as upper bound (when )
  • if and only if
notion image
i.e., is a fixed point of forward-backward operator

Accelerated proximal gradient method

notion image
is smooth, is closed proper convex
  • method:
notion image
works for and similar line search as before
  • faster convergence rate, originated with Nesterov (1983)

Applications

Lasso

notion image
with and
  • proximal gradient method (similar for accelerated):
notion image
  • faster implementations:
    • parallel matrix-vector multiplication
    • if , precompute and , then solve smaller lasso problem with ; effort is then mostly computing
    • compute and in parallel with one in memory at a time
    • compute the entire regularization path with warm starting
  • can easily generalize to group lasso, elastic net, GLMs, ...

Monotone Operators

Relations

  • a relation on a set is a subset of
  • dom
  • overload to mean the set
  • can think of as 'set-valued mapping', i.e., from dom into
  • when is always empty or a singleton, we say is a function
  • any function (or operator) with is a relation is then ambiguous: it can mean or { }
  • Examples
    • empty relation:
    • full relation:
    • identity:
    • zero:
    • subdifferential relation:

Operations on relations

  • inverse (relation):
    • inverse exists for any relation
    • coincides with inverse function, when inverse function exists
  • composition:
  • scalar multiplication:
  • addition:
  • Example: Resolvent of operator
    • for relation and , resolvent (much more on this later) is relation
    • notion image
    • for

Generalized equations

  • goal: solve generalized equation
  • i.e., find with
  • solution set or zero set is
  • if and , then means minimizes

Monotone operators

  • relation on is monotone if
notion image
  • is maximal monotone if there is no monotone operator that properly contains it
  • is maximal monotone iff it is a connected curve with no endpoints, with nonnegative (or infinite) slope
notion image
  • suppose and are monotone
    • sum: is monotone
    • nonnegative scaling: if , then is monotone
    • inverse: is monotone
    • congruence: for is monotone (on )
    • zero set: is convex if is maximal monotone
  • affine function is monotone iff

Subdifferential

is monotone
  • suppose and
  • then
notion image
  • add these and cancel to get
notion image
if is convex closed proper (CCP) then is maximal monotone

Nonexpansive and contractive operators

  • has Lipschitz constant if
notion image
  • for , we say is nonexpansive
  • for , we say is a contraction (with contraction factor )

Properties

  • if and have Lipschitz constant , so does
notion image
  • composition of nonexpansive operators is nonexpansive
  • composition of nonexpansive operator and contraction is a contraction
  • fixed point set of nonexpansive (with )
notion image
is convex (but can be empty)
  • a contraction has a single fixed point

Resolvent and Cayley operator

  • for , resolvent of relation is
notion image
  • when and monotone, is nonexpansive (thus a function)
  • when and maximal monotone,
  • Cayley operator of is
notion image
  • when and monotone, is nonexpansive
  • we write and to explicitly show dependence on

Proof that is nonexpansive

assume and monotone
  • suppose and , i.e.,
notion image
  • subtract to get
  • multiply by and use monotonicity of to get
notion image
  • so when , we must have (i.e., is a function)
  • now let's show is nonexpansive
notion image
using inequality above
  • is nonexpansive since it is the average of and :
notion image
Example: Linear operators
  • linear mapping is
    • monotone iff
    • nonexpansive iff
  • and
    • nonsingular
  • for matrix case, we have alternative formula for Cayley operator:
notion image
cf. bilinear function , which maps
notion image
Resolvent of subdifferential: Proximal mapping
  • suppose , with convex
  • this means
  • rewrite as
notion image
which is the same as
notion image
  • RHS called proximal mapping of , denoted
Example: Indicator function
  • take , indicator function of convex set
  • is the normal cone operator
notion image
  • proximal operator of (i.e., resolvent of ) is
notion image
where is (Euclidean) projection onto

Resolvent of multiplier to residual map

  • take to be multiplier to residual mapping for convex problem
notion image
  • where
  • means
  • for some
  • write as
notion image
  • write second term as , so
notion image
  • function on right side is augmented Lagrangian for the problem
  • so can be found as
notion image

Fixed points of Cayley and resolvent operators

  • assume is maximal monotone,
  • solutions of are fixed points of :
notion image
  • solutions of are fixed points of :
notion image
  • key result: we can solve by finding fixed points of or
  • next: how to actually find these fixed points

Contraction mapping theorem

  • also known as Banach fixed point theorem
  • assume is contraction, with Lipschitz constant
  • the iteration
notion image
converges to the unique fixed point of
  • proof (sketch):
    • sequence is Cauchy:
    • hence converges to a point
    • must be (the) fixed point
Example: Gradient method
  • assume is convex, (i.e., strongly convex, Lipschitz)
  • gradient method is
notion image
(fixed points are exactly solutions of )
  • is a Lipschitz with parameter
  • is a contraction when
  • so gradient method converges (geometrically) when

Damped iteration of a nonexpansive operator

  • suppose is nonexpansive, , with fixed point set
  • can have (e.g., translation)
  • simple iteration of need not converge, even when (e.g, rotation)
  • damped iteration:
notion image
  • important special case:
  • another special case: , which gives simple averaging
notion image

Convergence results

  • assume is nonexpansive, , and
notion image
(which holds for special cases above)
  • then we have
notion image
i.e., (some) iterates get arbitrarily close to fixed point set, and
notion image
i.e., (some) iterates yield arbitrarily good 'almost fixed points'
notion image
  • is no farther from than is (by nonexpansivity)
  • so is closer to than is

Proof

  • start with identity
notion image
  • apply to
notion image
using
  • iterate inequality to get
notion image
  • if for , then
notion image
  • RHS goes to zero as

Damped Cayley iteration

  • want to solve with maximal monotone
  • damped Cayley iteration
notion image
with and
  • converges (assuming ) in sense given above
  • important: requires ability to evaluate resolvent map of

Proximal point algorithm

  • take in damped Cayley iteration
  • gives resolvent iteration or proximal point algorithm:
notion image
  • if with convex, yields proximal minimization algorithm
notion image
can interpret as quadratic regularization that goes away in limit
  • many classical algorithms are just proximal point methods applied to the appropriate maximal monotone operator

method of multipliers

  • like dual method, but with augmented Lagrangian, specific step size
  • converges under far more general conditions than dual subgradient
  • need not be strictly convex, or differentiable
  • can take on value
  • but not amenable to decomposition

Monotone Operator Splitting Methods

Operator splitting

  • want to solve with maximal monotone
  • main idea: write as , with and maximal monotone
  • called operator splitting
  • solve using methods that require evaluation of resolvents
notion image
(or Cayley operators and )
  • useful when and can be evaluated more easily than

main result

  • maximal monotone, so Cayley operators nonexpansive
  • hence nonexpansive
  • key result:
notion image
  • so solutions of can be found from fixed points of nonexpansive operator

Proof of main result

  • write , and as
notion image
  • subtract 2nd & 4th equations to conclude
  • 4th equation is then
  • now add and to get
notion image
  • hence
  • argument goes other way (but we don't need it)

Peaceman-Rachford and Douglas-Rachford splitting

  • Peaceman-Rachford splitting is undamped iteration
notion image
doesn't converge in general case; need or to be contraction
  • Douglas-Rachford splitting is damped iteration
notion image
always converges when has solution

Douglas-Rachford splitting

write D-R iteration as
notion image
last update follows from
notion image
  • can consider as a residual
  • is running sum of residuals

Douglas-Rachford algorithm

  • many ways to rewrite/rearrange D-R algorithm
  • equivalent to many other algorithms; often not obvious
  • need very little: maximal monotone; solution exists
  • and are handled separately (via and ); they are β€˜uncoupled’

Alternating direction method of multipliers

to minimize , we solve with , D-R is
notion image
a special case of the alternating direction method of multipliers (ADMM)

Constrained optimization

  • constrained convex problem:
notion image
  • take and
  • so and
  • D-R is
notion image

Dykstra's alternation projections

  • find a point in the intersection of convex sets
  • D-R gives algorithm
notion image
  • this is Dykstra’s alternating projections algorithm
  • much faster than classical alternating projections (e.g., for polyhedral, converges in finite number of steps)

Consensus optimization

  • want to minimize
  • rewrite as consensus problem
notion image
  • D-R consensus optimization:
notion image

Douglas-Rachford consensus

  • -update splits into separate (parallel) problems:
notion image
  • -update is averaging:
notion image
  • -update becomes
notion image
  • taking average of last equation, we get
  • renaming as , D-R consensus becomes
notion image
  • subsystem (local) state:
  • gather 's to compute , which is then scattered
Β