Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Optimization Methods and Algorithms, Study notes of Algorithms and Programming

Various optimization methods and algorithms such as Conjugate Direction methods, Gradient Descent, Simplex algorithm, Subgradients, Quasi-Newton Methods, LP Duality, and more. It covers topics such as LPs, standard form, subdifferential, conic sets, and generalized gradient descent. The document also includes mathematical equations and proofs. It could be useful for students studying optimization, linear programming, and related topics.

Typology: Study notes

2021/2022

Uploaded on 05/11/2023

amoda
amoda 🇺🇸

4.1

(12)

12 documents

1 / 2

Toggle sidebar

Related documents


Partial preview of the text

Download Optimization Methods and Algorithms and more Study notes Algorithms and Programming in PDF only on Docsity! I ♥ G rid Se ar ch LPs Standard Form: min cTx s.t. Ax = b, x ≥ 0, b ≥ 0. Getting it to standard form: Getting rid of ≥,≤: x1 ≤ 4→ x1 + x2 = 4, x2 ≥ 0 Getting rid of − vars: x ∈ R→ x = u− v, u, v ∈ R+ Bounded vars: x ∈ [2, 5]→ 2 ≤ x, x ≤ 5. Simplex algorithm: (1) Take cost function, turn into min z s.t. cTx = z, remainder in standard LP form. (2) Pivoting: do Gaussian Elimination to get rid of as many variables as possible, without distributing the z around. (3) Variables that have been eliminated ex- cept in one equation are dependent/basic; others independent/non-basic. Can always get a feasible point by setting non-basic variables to zero, and reading out basic variables.[ 1 0 C 0 Im A ] [−z, xB , xN ]T = [−z0, b]T (4) Improve solutions: find smallest reduced cost Cj . If CJ ≥ 0, optimality reached, quit. Else, J is incoming. (5) Find as far as we can go by picking out- going variable: r = argmini|Ai,j>0 bi/Ai,j (6) Perform elimination to get rid of J , us- ing equation that makes the outgoing vari- able a basic one. That is, take the only equation in which the outgoing variable is non-zero, and eliminate the incoming vari- able with it. (7) Repeat from 4 until optimality reached. Convex sets,fcns: Defns: A set is is X if for any weighted sum of data points satisfying Y, the weighted sum is in the set. Convex: ∑ i θi = 1, θi ≥ 0 Affine: ∑ i θi = 1. Conic: θi ≥ 0. Examples: Lines, line segments, hyperplanes, halfs- paces, Lp balls for p ≥ 1, polyhedrons, polytopes. Preserving operations: Translation, scaling, intersection, Affine functions (e.g., projection, coordinate drop- ping), set sum {c1 + c2|c1 ∈ C1, c2 ∈ C2}, direct sum {(c1, c2)|c1 ∈ C1, c2 ∈ C2}, per- spective projection. Conv. Fcn. Defn: f(θx+ (1− θ)y) ≤ θf(x) + (1− θ)f(y) f(y) ≥ f(x) +∇f(x)T (y − x) Preserving operations, functions: Non-negative weighted sum, pointwise- max, affine map f(Ax + b), composition, perspective map. Strict, Strong Convexity Defns: Strict convexity: f(θx+ (1− θ)y) < θf(x) + (1− θ)f(y) (ba- sically, not linear). m-Strong convexity: f(θx+ (1− θ)y) ≤ θf(x) + (1− θ)f(y) −1 2 mθ(1− θ)||x− y||22 Better strong convexity defns: (∇f(x)−∇f(y))T (x− y) ≥ m||x− y||22 f(y) ≥ f(x) +∇f(x)T (y − x) + m 2 ||y − x||22 ∇2f(x) ≥ mI. Gradient Descent Given x0, repeat xk = xk−1 − tk∇f(xk−1). Picking t: can diverge if t too big, too slow if t too small. Backtracing line search: start with t = 1, while f(x− t∇f(x)) > f(x)−αt||∇f(x)||22, update t = βt with 0 < α < 1/2, 0 < β < 1. Subgradients Defn.: Subgradient of convex f is g s.t. f(y) ≥ f(x) + gT (y − x) Subdifferential ∂f(X): set of all g. SG calculus: ∂(af) = a∂f ; ∂(f1 + f2) = ∂f1 + ∂f2; ∂f(Ax+ b) = AT ∂f(Ax+ b). Finite-pointwise max: ∂maxf∈F f(x) is the convex hull of the active (achieving max functions at x). Norms: if f(x) = ||x||p and 1/p + 1/q = 1, then ||x||p = max||z||q≤1 z Tx; thus ∂||x||p = {y : ||y||q ≤ 1, yTx = max||z||q≤1 z Tx}. Optimality: f(x∗) = min f(x) ↔ 0 ∈ ∂f(x∗) Remember that sgs may not exist for non- convex functions! Subgradient Method Given x0, repeat xk = xk−1 − tkgk−1 SG method not descent method; keep track of best so far. Picking t: square summable but not summable (e.g., 1/t). Polyak steps: (f(xk−1)− f(x∗))/||gk−1||22. Projected sg method: Project after taking a step. Generalized GD Suppose f(x) = g(x) + h(x) with g convex, diff, h convex, not necessarily diff. Define proxt(x) = argminz 1 2t ||x − z||22 + h(z); GGD is: xk = proxt(x k−1 − tk∇g(xk−1)) Generalized gradient since if Gt(x) = (1/t)(x− proxt(x− t∇g(x))) then update is xk = xk−1 − tkGt(xk−1) With backtracking: While g(x − tGt(x)) > g(x)− t∇g(x)TGt(x) + t 2 ||Gt(x)||22 (maybe with α in last term?) update t = βt. Example (Lasso): Prox is argminz 1 2t ||β − z||22 + λ||z||1 = Sλt(β). Sλ(β) is the soft- threshold operator, [Sλ(β)]i = { βi − λ : βi > λ 0 : −λ ≤ βi ≤ λ βi + λ : βi < −λ Example (Matrix Completion): Objective: 1 2 ∑ (i,j) observ(Yi,j − Bi,j) 2 + λ||B||∗ with ||B||∗ = ∑r i=1 σi(B). Prox function: argminZ 1 2t ||B − Z||2F + λ||Z|∗. Solution: matrix soft-thresholding; UΣλV T where B = UΣV T and (Σλ)ii = max{Σii − λ, 0}. Newton’s Method: Originally devel- oped for finding roots; use it to find roots of gradient. Want ∇f(x) +∇2f(x)∆x = 0; solution is ∆x = −[∇2f(x)]−1∇f(x). Damped Newton method: xk+1 = xk − hk[∇2f(x)]−1∇f(x). Conjugate Direction methods: Want to solve min 1 2 xTQx − bTx with Q > 0. Define Q-orthogonality as dTi Qdj = 0. Exp. subspace thm.: Let {di}n−1 i=0 be Q-conjugate. (for method) gk = Qxk − b xk+1 = xk + αdk αk = −gTk dk/(dTkQdk) Proof sketch (gk ⊥ Bk) by ind.: gk+1 = Qxk+1 − b = Q(xk + αkdk)− b (Qxk − b) + αQdk = gk + αQdk From here, by defn of α, dTk gk+1 = dTk (gk + αQdk) = dTk gk − αdTkQdk = 0 Algorithm: Arbitrary x0, repeat d0 = −g0 = b−Qx0 αk = −gTk dk/dTkQdk; xk+1 = xk + αkdk gk = Qxk − b; dk+1 = −gk+1 + βkdk βk = gTk+1Qdk/(dkQdk) Quasi-Newton Methods: Gist: approximate Hessian/inverse Hes- sian. Symmetric rank-one correction: Update: xk+1 = xk − αHkgk αk = argminα f(xk − αHkgk) (LS) gk = ∇fk Hk+1 = Hk + (pk−Hkqk)(pk−Hkqk) T qT k (pk−Hkqk) pk = xk+1 − xk; qk = gk+1 − gk Might not be PSD! DFP (Rank 2) Hk+1 = Hk + pkp T k pTk qk − Hkqkq T kHk qTkHkqk BFGS Update inverse of Hessian via Sherman- Morrison). Let qk = gk+1 − gk Hk+1 =Hk + (1 + qTkHkqk pTk qk ) pkp T k pTk qk − pkq T kHk +Hkqkp T k qkpk LP Duality Let cn, Am×n, bm, Gr×n, hr. (P) min cTx s.t. Ax = b, Gx ≤ h (D) max−bTu− hT v s.t. −ATu−GT v = c, v ≥ 0. Duality: Consider min f(x) s.t. hi(x) ≤ 0, i = 1, . . . ,m lj(x) = 0 j = 1, . . . , r Lagrangian: L(x, u, v) = f(x) + ∑m i=1 uihi(x) +∑r j=1 vj lj(x) with u ∈ Rm, v ∈ Rr and u ≥ 0. Note: f(x) ≥ L(x, u, v) at feasible x. Dual problem: Let g(u, v) = minx L(x, u, v). La- grange dual function is g. Dual problem maxu≥0,v g(u, v). Note: dual problem always concave. Strong duality: Always have f∗ ≥ g∗ where f∗, g∗ primal and dual objectives. When f∗ = g∗, have strong duality. If primal is a convex prob- lem (f, hi convex, lj affine) and exists a strictly feasible x, then strong duality. Dual example (lasso): Have primal:
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved