Download Greedy Algorithms: Kruskal's Minimum Spanning Tree and Huffman Codes and more Study notes Computer Science in PDF only on Docsity! Greedy Algorithm • A greedy algorithm always makes the choice that looks best at the moment • Key point: Greed makes a locally optimal choice in the hope that this choice will lead to a globally optimal solution • Note: Greedy algorithms do not always yield optimal solutions, but for SOME problems they do Greed • When do we use greedy algorithms? – When we need a heuristic (e.g., hard problems like the Traveling Salesman Problem) – When the problem itself is “greedy” • Greedy Choice Property (CLRS 16.2) • Optimal Substructure Property (shared with DP) (CLRS 16.2) • Examples: – Minimum Spanning Tree (Kruskal’s algorithm) – Optimal Prefix Codes (Huffman’s algorithm) Proof of Kruskal’s Algorithm § Basis: |T| = 0, trivial. § Induction Step: T is promising by I.H., so it is a subgraph of some MST, call it S. Let ei be the smallest edge in E, s.t. T∪{ei} has no cycle, ei∉T. § If ei∈S, we’re done. § Suppose ei∉S, then S’ = S ∪ {ei} has a unique cycle containing ei, and all other arcs in cycle ≤ ei (because S is an MST!) § Call the cycle C. Observe that C with ei cannot be in T, because T ∪ {ei} is acyclic (because Kruskal adds ei) • Then C must contains some edge ej s.t. ej∉S, and we also know c(ej)≥c(ei). • Let S’ = S ∪ {ei} \ {ej} • S’ is an MST, so T∪{ei} is promising ej ei Proof of Kruskal’s Algorithm Greedy Algorithm: Huffman Codes • Prefix codes – one code per input symbol – no code is a prefix of another • Why prefix codes? – Easy decoding – Since no codeword is a prefix of any other, the codeword that begins an encoded file is unambiguous – Identify the initial codeword, translate it back to the original character, and repeat the decoding process on the remainder of the encoded file Greedy Algorithm: Huffman Codes Step 1: Step 2: Step 3: Step 4: 2 A M 2 T U 2 A M 4 2 V Y 2 A M 2 T U 2 T U 4 N 4 2 V Y 2 A M Greedy Algorithm: Huffman Codes Step 5: Step 6: 2 T U 4 N 4 2 V Y 2 A M 4 O R 2 T U 4 N 4 2 V Y 2 A M 4 O R 4 S G Greedy Algorithm: Huffman Codes Step 7 Step 8 2 T U 4 N 4 2 V Y 2 A M 5 I E 4 S G 4 O R 2 T U 4 N 5 I E 4 S G 4 O R4 2 V Y 2 A M 7 Proof That Huffman’s Merge is Optimal • Let T be an optimal prefix-code tree in which a, b are siblings at deepest level, L(a) = L(b) • Suppose that x, y are two other nodes that are merged by the Huffman algorithm – x, y have lowest weights because Huffman chose them – WLOG w(x) ≤ w(a), w(y) ≤ w(b); L(a) = L(b) ≥ L(x), L(y) – Swap a and x: cost difference between T and new T’ is • w(x)L(x) + w(a)L(a) – w(x)L(a) – w(a)L(x) = (w(a) – w(x))(L(a) – L(x)) // both factors non-neg ≥ 0 – Similar argument for b, y à Huffman choice also optimal T x a b y Dynamic Programming • Dynamic programming: Divide problem into overlapping subproblems; recursively solve each in the same way. • Similar to DQ, so what’s the difference: – DQ partition the problem into independent subproblems. – DP breaking it into overlapping subproblems, that is, when subproblems share subproblems. – So DP saves work compared with DQ by solving every subproblems just once ( when subproblems are overlapping). Elements of Dynamic Programming • Optimal substructure: A problem exhibits optimal substructure if an optimal solution to the problem contains within it optimal solutions to subproblems. Whenever a problem exhibits optimal substructure, it is a good clue that DP might apply.(a greedy method might apply also.) • Overlapping subproblems: A recursive algorithm for the problem solves the same subproblems over and over, rather than always generating new subproblems. • Cost of final multiplication? A1 • A2 • A3 • … • Ak-1 • Ak • …• An. • Each of these subproblems can be solved optimally – just look in the table d1 × dk dk×dn+1 Dynamic Programming: Matrix Chain Product Dynamic Programming: Matrix Chain Product • FORMULATION: – Table entries aij, 1≤ i ≤ j ≤ n, where aij = optimal solution = min #multiplications for Ai • Ai+1 • … • Aj-1 • Ak • • We want aij to fill the table. – Let dimensions be given by vector di, 1≤ i ≤ n+1, i.e., Ai is di×di+1 Dynamic Programming: Matrix Chain Product • Build Table: Diagonal S contains aij with j - i = S. • S = 0: aij =0, i=1, 2, …, n S = 1: ai, i+1 = di di+1di+2, i=1, 2, …, n-1 1 < S < n: ai, i+s = (ai, k + ak+1, i+s + di dkdi+s ) • Example: (Brassard/Bratley) 4 matrices, d = (13, 5, 89, 3, 34) S = 1: a12 =5785 a23=1335 a34 =9078 min siki +≤≤ Dynamic Programming: Longest Common Subsequence Let L(k, l) denote length of LCS for [a1 a2… ak] and [b1 b2… bl]. Then we have facts: • L(p, q) ≥ L(p-1, q-1). 1) L(p, q) = L(p-1, q-1) + 1 if ap = bq when ap and bq are both in LCS. 2) L(p, q) ≥ L(p-1, q) when ap is not in LCS. 3) L(p, q) ≥ L(p, q-1) when bq is not in LCS. Dynamic Programming: Longest Common Subsequence • ALGORITHM: for i = 1 to m for j = 1 to n if ai = bj then L(i, j) = L(i-1, j-1) + 1 else L(i, j) = max{L(i, j-1), L(i-1, j)} • Time complexity: Θ(n2). Dynamic Programming: Knapsack • The problem: The knapsack problem is a particular type of integer program with just one constraint: Each item that can go into the knapsack has a size and a benefit. The knapsack has a certain capacity. What should go into the knapsack so as to maximize the total benefit? • Hint: Recall shortest path method. Define Fk(y) = max (0≤k≤n) with (0≤y≤b) • Then, what is Fk(y)? Max value possible using only first k items when weight limit is y. ∑ = k j jj xv 1 ∑ = k j jj xw 1 Dynamic Programming: Knapsack • Note: 12 = max(11, 9+ F4(3))=max(11,9+3)=12 • What is missing here? (Like in SP, we know the SP’s cost, but we don’t know SP itself…) • So, we need another table? i(k,j) = max index such that item type j is used in Fk(y), i.e., i(k,y)=j ⇒ xj≥1; xq=0 ∀q>j • B.C.’s: i(1,y) =0 if F1(y) = 0 i(1,y) =0 if F1(y) ≠ 0 • General: +−≤ +−>− = − − kkkk kkkk vwyFyFifk vwyFyFifyki yki )()( )()(),1( ),( 1 1 Dynamic Programming: Knapsack • Trace Back: if i(k,y) =q, use item q once, check i(k,y-q). • Example: • E.g. F4(10) = 12. i(4,10)=4 ⇒4th item used once 4 3 2 1 Y K 4434333210 3333333210 2222222210 1111111110 10987654321 Dynamic Programming: Knapsack i(4, 10 - w4) =i(4,3)=2 ⇒2nd item used once i(4, 3 – w2) =i(4,0)=0 ⇒done • Notice i(4,8)=3 ⇒don’t use most valuable item.