Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Search for study opportunitiesNEW

Connect with the world's best universities and choose your course of study

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Optimization Methods and Algorithms, Study notes of Algorithms and Programming

Stanford University Algorithms and Programming

Various optimization methods and algorithms such as Conjugate Direction methods, Gradient Descent, Simplex algorithm, Subgradients, Quasi-Newton Methods, LP Duality, and more. It covers topics such as LPs, standard form, subdiﬀerential, conic sets, and generalized gradient descent. The document also includes mathematical equations and proofs. It could be useful for students studying optimization, linear programming, and related topics.

Typology: Study notes

2021/2022

Uploaded on 05/11/2023

amoda 🇺🇸

4.1

(12)

12 documents

1 / 2

Partial preview of the text

Download Optimization Methods and Algorithms and more Study notes Algorithms and Programming in PDF only on Docsity! I ♥ G rid Se ar ch LPs Standard Form: min cTx s.t. Ax = b, x ≥ 0, b ≥ 0. Getting it to standard form: Getting rid of ≥,≤: x1 ≤ 4→ x1 + x2 = 4, x2 ≥ 0 Getting rid of − vars: x ∈ R→ x = u− v, u, v ∈ R+ Bounded vars: x ∈ [2, 5]→ 2 ≤ x, x ≤ 5. Simplex algorithm: (1) Take cost function, turn into min z s.t. cTx = z, remainder in standard LP form. (2) Pivoting: do Gaussian Elimination to get rid of as many variables as possible, without distributing the z around. (3) Variables that have been eliminated ex- cept in one equation are dependent/basic; others independent/non-basic. Can always get a feasible point by setting non-basic variables to zero, and reading out basic variables.[ 1 0 C 0 Im A ] [−z, xB , xN ]T = [−z0, b]T (4) Improve solutions: find smallest reduced cost Cj . If CJ ≥ 0, optimality reached, quit. Else, J is incoming. (5) Find as far as we can go by picking out- going variable: r = argmini|Ai,j>0 bi/Ai,j (6) Perform elimination to get rid of J , us- ing equation that makes the outgoing vari- able a basic one. That is, take the only equation in which the outgoing variable is non-zero, and eliminate the incoming vari- able with it. (7) Repeat from 4 until optimality reached. Convex sets,fcns: Defns: A set is is X if for any weighted sum of data points satisfying Y, the weighted sum is in the set. Convex: ∑ i θi = 1, θi ≥ 0 Affine: ∑ i θi = 1. Conic: θi ≥ 0. Examples: Lines, line segments, hyperplanes, halfs- paces, Lp balls for p ≥ 1, polyhedrons, polytopes. Preserving operations: Translation, scaling, intersection, Affine functions (e.g., projection, coordinate drop- ping), set sum {c1 + c2|c1 ∈ C1, c2 ∈ C2}, direct sum {(c1, c2)|c1 ∈ C1, c2 ∈ C2}, per- spective projection. Conv. Fcn. Defn: f(θx+ (1− θ)y) ≤ θf(x) + (1− θ)f(y) f(y) ≥ f(x) +∇f(x)T (y − x) Preserving operations, functions: Non-negative weighted sum, pointwise- max, affine map f(Ax + b), composition, perspective map. Strict, Strong Convexity Defns: Strict convexity: f(θx+ (1− θ)y) < θf(x) + (1− θ)f(y) (ba- sically, not linear). m-Strong convexity: f(θx+ (1− θ)y) ≤ θf(x) + (1− θ)f(y) −1 2 mθ(1− θ)||x− y||22 Better strong convexity defns: (∇f(x)−∇f(y))T (x− y) ≥ m||x− y||22 f(y) ≥ f(x) +∇f(x)T (y − x) + m 2 ||y − x||22 ∇2f(x) ≥ mI. Gradient Descent Given x0, repeat xk = xk−1 − tk∇f(xk−1). Picking t: can diverge if t too big, too slow if t too small. Backtracing line search: start with t = 1, while f(x− t∇f(x)) > f(x)−αt||∇f(x)||22, update t = βt with 0 < α < 1/2, 0 < β < 1. Subgradients Defn.: Subgradient of convex f is g s.t. f(y) ≥ f(x) + gT (y − x) Subdifferential ∂f(X): set of all g. SG calculus: ∂(af) = a∂f ; ∂(f1 + f2) = ∂f1 + ∂f2; ∂f(Ax+ b) = AT ∂f(Ax+ b). Finite-pointwise max: ∂maxf∈F f(x) is the convex hull of the active (achieving max functions at x). Norms: if f(x) = ||x||p and 1/p + 1/q = 1, then ||x||p = max||z||q≤1 z Tx; thus ∂||x||p = {y : ||y||q ≤ 1, yTx = max||z||q≤1 z Tx}. Optimality: f(x∗) = min f(x) ↔ 0 ∈ ∂f(x∗) Remember that sgs may not exist for non- convex functions! Subgradient Method Given x0, repeat xk = xk−1 − tkgk−1 SG method not descent method; keep track of best so far. Picking t: square summable but not summable (e.g., 1/t). Polyak steps: (f(xk−1)− f(x∗))/||gk−1||22. Projected sg method: Project after taking a step. Generalized GD Suppose f(x) = g(x) + h(x) with g convex, diff, h convex, not necessarily diff. Define proxt(x) = argminz 1 2t ||x − z||22 + h(z); GGD is: xk = proxt(x k−1 − tk∇g(xk−1)) Generalized gradient since if Gt(x) = (1/t)(x− proxt(x− t∇g(x))) then update is xk = xk−1 − tkGt(xk−1) With backtracking: While g(x − tGt(x)) > g(x)− t∇g(x)TGt(x) + t 2 ||Gt(x)||22 (maybe with α in last term?) update t = βt. Example (Lasso): Prox is argminz 1 2t ||β − z||22 + λ||z||1 = Sλt(β). Sλ(β) is the soft- threshold operator, [Sλ(β)]i = { βi − λ : βi > λ 0 : −λ ≤ βi ≤ λ βi + λ : βi < −λ Example (Matrix Completion): Objective: 1 2 ∑ (i,j) observ(Yi,j − Bi,j) 2 + λ||B||∗ with ||B||∗ = ∑r i=1 σi(B). Prox function: argminZ 1 2t ||B − Z||2F + λ||Z|∗. Solution: matrix soft-thresholding; UΣλV T where B = UΣV T and (Σλ)ii = max{Σii − λ, 0}. Newton’s Method: Originally devel- oped for finding roots; use it to find roots of gradient. Want ∇f(x) +∇2f(x)∆x = 0; solution is ∆x = −[∇2f(x)]−1∇f(x). Damped Newton method: xk+1 = xk − hk[∇2f(x)]−1∇f(x). Conjugate Direction methods: Want to solve min 1 2 xTQx − bTx with Q > 0. Define Q-orthogonality as dTi Qdj = 0. Exp. subspace thm.: Let {di}n−1 i=0 be Q-conjugate. (for method) gk = Qxk − b xk+1 = xk + αdk αk = −gTk dk/(dTkQdk) Proof sketch (gk ⊥ Bk) by ind.: gk+1 = Qxk+1 − b = Q(xk + αkdk)− b (Qxk − b) + αQdk = gk + αQdk From here, by defn of α, dTk gk+1 = dTk (gk + αQdk) = dTk gk − αdTkQdk = 0 Algorithm: Arbitrary x0, repeat d0 = −g0 = b−Qx0 αk = −gTk dk/dTkQdk; xk+1 = xk + αkdk gk = Qxk − b; dk+1 = −gk+1 + βkdk βk = gTk+1Qdk/(dkQdk) Quasi-Newton Methods: Gist: approximate Hessian/inverse Hes- sian. Symmetric rank-one correction: Update: xk+1 = xk − αHkgk αk = argminα f(xk − αHkgk) (LS) gk = ∇fk Hk+1 = Hk + (pk−Hkqk)(pk−Hkqk) T qT k (pk−Hkqk) pk = xk+1 − xk; qk = gk+1 − gk Might not be PSD! DFP (Rank 2) Hk+1 = Hk + pkp T k pTk qk − Hkqkq T kHk qTkHkqk BFGS Update inverse of Hessian via Sherman- Morrison). Let qk = gk+1 − gk Hk+1 =Hk + (1 + qTkHkqk pTk qk ) pkp T k pTk qk − pkq T kHk +Hkqkp T k qkpk LP Duality Let cn, Am×n, bm, Gr×n, hr. (P) min cTx s.t. Ax = b, Gx ≤ h (D) max−bTu− hT v s.t. −ATu−GT v = c, v ≥ 0. Duality: Consider min f(x) s.t. hi(x) ≤ 0, i = 1, . . . ,m lj(x) = 0 j = 1, . . . , r Lagrangian: L(x, u, v) = f(x) + ∑m i=1 uihi(x) +∑r j=1 vj lj(x) with u ∈ Rm, v ∈ Rr and u ≥ 0. Note: f(x) ≥ L(x, u, v) at feasible x. Dual problem: Let g(u, v) = minx L(x, u, v). La- grange dual function is g. Dual problem maxu≥0,v g(u, v). Note: dual problem always concave. Strong duality: Always have f∗ ≥ g∗ where f∗, g∗ primal and dual objectives. When f∗ = g∗, have strong duality. If primal is a convex prob- lem (f, hi convex, lj affine) and exists a strictly feasible x, then strong duality. Dual example (lasso): Have primal:

Documents

questions

Optimization Methods and Algorithms, Study notes of Algorithms and Programming

Related documents

Partial preview of the text