Download Huffman Codes and Algorithms: Lecture 22 & 23 and more Study notes Algorithms and Programming in PDF only on Docsity! Design and analysis of algorithms Lecture 22& 23 Edyta Szymańska edyta@cc.gatech.edu CS3510 A, Fall 2005 – p. 1/?? Huffman codes - another greedy algorithm Store a map of a chromosome, i.e. a string of 130 million characters : A, C, G, T. CS3510 A, Fall 2005 – p. 2/?? Huffman codes - another greedy algorithm Store a map of a chromosome, i.e. a string of 130 million characters : A, C, G, T. How to do this? Default: 1 byte per character ? NO ! 2 bits per character suffice: A : 00, C : 01, G : 10, T : 11, total= 260 Megabits used CS3510 A, Fall 2005 – p. 2/?? Huffman codes - another greedy algorithm Store a map of a chromosome, i.e. a string of 130 million characters : A, C, G, T. How to do this? Default: 1 byte per character ? NO ! 2 bits per character suffice: A : 00, C : 01, G : 10, T : 11, total= 260 Megabits used Extra information: the characters appear in the string with different frequencies, namely f [A] = 70 · 106, f [C] = 3 · 106, f [G] = 20 · 106, f [T ] = 37 · 106 Thus, it should be worth assigning a shorter bit string to A than to C. CS3510 A, Fall 2005 – p. 2/?? Huffman codes - another greedy algorithm Store a map of a chromosome, i.e. a string of 130 million characters : A, C, G, T. How to do this? Default: 1 byte per character ? NO ! 2 bits per character suffice: A : 00, C : 01, G : 10, T : 11, total= 260 Megabits used Extra information: the characters appear in the string with different frequencies, namely f [A] = 70 · 106, f [C] = 3 · 106, f [G] = 20 · 106, f [T ] = 37 · 106 Thus, it should be worth assigning a shorter bit string to A than to C. CS3510 A, Fall 2005 – p. 2/?? Prefix codes prefix codes: no codeword can be a prefix of a different codeword (to avoid ambiguity in decoding). The following is a proper prefix code: A : 0, C : 110, G : 111, T : 10 Total number of bits used: 213 · 106, improvement = 17% G[20] CS3510 A, Fall 2005 – p. 3/?? Prefix codes prefix codes: no codeword can be a prefix of a different codeword (to avoid ambiguity in decoding). The following is a proper prefix code: A : 0, C : 110, G : 111, T : 10 Total number of bits used: 213 · 106, improvement = 17% Representation: binary tree 1 1 1 0 0 0 [60] [23] G[20]C[3] T[37] A[70] CS3510 A, Fall 2005 – p. 3/?? Huffman encoding algorithm The tree representation provides also a decoding scheme. CS3510 A, Fall 2005 – p. 4/?? Huffman encoding algorithm Properties of the optimal solution: CS3510 A, Fall 2005 – p. 5/?? Huffman encoding algorithm Properties of the optimal solution: The optimal solution is represented by a full binary tree. CS3510 A, Fall 2005 – p. 5/?? Huffman encoding algorithm Properties of the optimal solution: The optimal solution is represented by a full binary tree. The two characters with smallest frequencies must be together at the bottom of the tree, as children of the lowest internal node of the tree. CS3510 A, Fall 2005 – p. 5/?? Huffman encoding algorithm Greedy algorithm: CS3510 A, Fall 2005 – p. 6/?? Huffman encoding algorithm Greedy algorithm: HUFFMAN(C) Q := C for i := 1 to n − 1 do allocate a new node z left[z] := x :=EXTRACT MIN(Q) right[z] := y :=EXTRACT MIN(Q) f [z] := f [x] + f [y] INSERT(Q, z) return EXTRACT MIN(Q) CS3510 A, Fall 2005 – p. 6/?? Huffman encoding algorithm Greedy algorithm: HUFFMAN(C) Q := C for i := 1 to n − 1 do allocate a new node z left[z] := x :=EXTRACT MIN(Q) right[z] := y :=EXTRACT MIN(Q) f [z] := f [x] + f [y] INSERT(Q, z) return EXTRACT MIN(Q) Example: CS3510 A, Fall 2005 – p. 6/??