Download Optimization Methods and Algorithms and more Study notes Algorithms and Programming in PDF only on Docsity! I ♥ G rid Se ar ch LPs Standard Form: min cTx s.t. Ax = b, x ≥ 0, b ≥ 0. Getting it to standard form: Getting rid of ≥,≤: x1 ≤ 4→ x1 + x2 = 4, x2 ≥ 0 Getting rid of − vars: x ∈ R→ x = u− v, u, v ∈ R+ Bounded vars: x ∈ [2, 5]→ 2 ≤ x, x ≤ 5. Simplex algorithm: (1) Take cost function, turn into min z s.t. cTx = z, remainder in standard LP form. (2) Pivoting: do Gaussian Elimination to get rid of as many variables as possible, without distributing the z around. (3) Variables that have been eliminated ex- cept in one equation are dependent/basic; others independent/non-basic. Can always get a feasible point by setting non-basic variables to zero, and reading out basic variables.[ 1 0 C 0 Im A ] [−z, xB , xN ]T = [−z0, b]T (4) Improve solutions: find smallest reduced cost Cj . If CJ ≥ 0, optimality reached, quit. Else, J is incoming. (5) Find as far as we can go by picking out- going variable: r = argmini|Ai,j>0 bi/Ai,j (6) Perform elimination to get rid of J , us- ing equation that makes the outgoing vari- able a basic one. That is, take the only equation in which the outgoing variable is non-zero, and eliminate the incoming vari- able with it. (7) Repeat from 4 until optimality reached. Convex sets,fcns: Defns: A set is is X if for any weighted sum of data points satisfying Y, the weighted sum is in the set. Convex: ∑ i θi = 1, θi ≥ 0 Affine: ∑ i θi = 1. Conic: θi ≥ 0. Examples: Lines, line segments, hyperplanes, halfs- paces, Lp balls for p ≥ 1, polyhedrons, polytopes. Preserving operations: Translation, scaling, intersection, Affine functions (e.g., projection, coordinate drop- ping), set sum {c1 + c2|c1 ∈ C1, c2 ∈ C2}, direct sum {(c1, c2)|c1 ∈ C1, c2 ∈ C2}, per- spective projection. Conv. Fcn. Defn: f(θx+ (1− θ)y) ≤ θf(x) + (1− θ)f(y) f(y) ≥ f(x) +∇f(x)T (y − x) Preserving operations, functions: Non-negative weighted sum, pointwise- max, affine map f(Ax + b), composition, perspective map. Strict, Strong Convexity Defns: Strict convexity: f(θx+ (1− θ)y) < θf(x) + (1− θ)f(y) (ba- sically, not linear). m-Strong convexity: f(θx+ (1− θ)y) ≤ θf(x) + (1− θ)f(y) −1 2 mθ(1− θ)||x− y||22 Better strong convexity defns: (∇f(x)−∇f(y))T (x− y) ≥ m||x− y||22 f(y) ≥ f(x) +∇f(x)T (y − x) + m 2 ||y − x||22 ∇2f(x) ≥ mI. Gradient Descent Given x0, repeat xk = xk−1 − tk∇f(xk−1). Picking t: can diverge if t too big, too slow if t too small. Backtracing line search: start with t = 1, while f(x− t∇f(x)) > f(x)−αt||∇f(x)||22, update t = βt with 0 < α < 1/2, 0 < β < 1. Subgradients Defn.: Subgradient of convex f is g s.t. f(y) ≥ f(x) + gT (y − x) Subdifferential ∂f(X): set of all g. SG calculus: ∂(af) = a∂f ; ∂(f1 + f2) = ∂f1 + ∂f2; ∂f(Ax+ b) = AT ∂f(Ax+ b). Finite-pointwise max: ∂maxf∈F f(x) is the convex hull of the active (achieving max functions at x). Norms: if f(x) = ||x||p and 1/p + 1/q = 1, then ||x||p = max||z||q≤1 z Tx; thus ∂||x||p = {y : ||y||q ≤ 1, yTx = max||z||q≤1 z Tx}. Optimality: f(x∗) = min f(x) ↔ 0 ∈ ∂f(x∗) Remember that sgs may not exist for non- convex functions! Subgradient Method Given x0, repeat xk = xk−1 − tkgk−1 SG method not descent method; keep track of best so far. Picking t: square summable but not summable (e.g., 1/t). Polyak steps: (f(xk−1)− f(x∗))/||gk−1||22. Projected sg method: Project after taking a step. Generalized GD Suppose f(x) = g(x) + h(x) with g convex, diff, h convex, not necessarily diff. Define proxt(x) = argminz 1 2t ||x − z||22 + h(z); GGD is: xk = proxt(x k−1 − tk∇g(xk−1)) Generalized gradient since if Gt(x) = (1/t)(x− proxt(x− t∇g(x))) then update is xk = xk−1 − tkGt(xk−1) With backtracking: While g(x − tGt(x)) > g(x)− t∇g(x)TGt(x) + t 2 ||Gt(x)||22 (maybe with α in last term?) update t = βt. Example (Lasso): Prox is argminz 1 2t ||β − z||22 + λ||z||1 = Sλt(β). Sλ(β) is the soft- threshold operator, [Sλ(β)]i = { βi − λ : βi > λ 0 : −λ ≤ βi ≤ λ βi + λ : βi < −λ Example (Matrix Completion): Objective: 1 2 ∑ (i,j) observ(Yi,j − Bi,j) 2 + λ||B||∗ with ||B||∗ = ∑r i=1 σi(B). Prox function: argminZ 1 2t ||B − Z||2F + λ||Z|∗. Solution: matrix soft-thresholding; UΣλV T where B = UΣV T and (Σλ)ii = max{Σii − λ, 0}. Newton’s Method: Originally devel- oped for finding roots; use it to find roots of gradient. Want ∇f(x) +∇2f(x)∆x = 0; solution is ∆x = −[∇2f(x)]−1∇f(x). Damped Newton method: xk+1 = xk − hk[∇2f(x)]−1∇f(x). Conjugate Direction methods: Want to solve min 1 2 xTQx − bTx with Q > 0. Define Q-orthogonality as dTi Qdj = 0. Exp. subspace thm.: Let {di}n−1 i=0 be Q-conjugate. (for method) gk = Qxk − b xk+1 = xk + αdk αk = −gTk dk/(dTkQdk) Proof sketch (gk ⊥ Bk) by ind.: gk+1 = Qxk+1 − b = Q(xk + αkdk)− b (Qxk − b) + αQdk = gk + αQdk From here, by defn of α, dTk gk+1 = dTk (gk + αQdk) = dTk gk − αdTkQdk = 0 Algorithm: Arbitrary x0, repeat d0 = −g0 = b−Qx0 αk = −gTk dk/dTkQdk; xk+1 = xk + αkdk gk = Qxk − b; dk+1 = −gk+1 + βkdk βk = gTk+1Qdk/(dkQdk) Quasi-Newton Methods: Gist: approximate Hessian/inverse Hes- sian. Symmetric rank-one correction: Update: xk+1 = xk − αHkgk αk = argminα f(xk − αHkgk) (LS) gk = ∇fk Hk+1 = Hk + (pk−Hkqk)(pk−Hkqk) T qT k (pk−Hkqk) pk = xk+1 − xk; qk = gk+1 − gk Might not be PSD! DFP (Rank 2) Hk+1 = Hk + pkp T k pTk qk − Hkqkq T kHk qTkHkqk BFGS Update inverse of Hessian via Sherman- Morrison). Let qk = gk+1 − gk Hk+1 =Hk + (1 + qTkHkqk pTk qk ) pkp T k pTk qk − pkq T kHk +Hkqkp T k qkpk LP Duality Let cn, Am×n, bm, Gr×n, hr. (P) min cTx s.t. Ax = b, Gx ≤ h (D) max−bTu− hT v s.t. −ATu−GT v = c, v ≥ 0. Duality: Consider min f(x) s.t. hi(x) ≤ 0, i = 1, . . . ,m lj(x) = 0 j = 1, . . . , r Lagrangian: L(x, u, v) = f(x) + ∑m i=1 uihi(x) +∑r j=1 vj lj(x) with u ∈ Rm, v ∈ Rr and u ≥ 0. Note: f(x) ≥ L(x, u, v) at feasible x. Dual problem: Let g(u, v) = minx L(x, u, v). La- grange dual function is g. Dual problem maxu≥0,v g(u, v). Note: dual problem always concave. Strong duality: Always have f∗ ≥ g∗ where f∗, g∗ primal and dual objectives. When f∗ = g∗, have strong duality. If primal is a convex prob- lem (f, hi convex, lj affine) and exists a strictly feasible x, then strong duality. Dual example (lasso): Have primal: