Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Notes for Huffman Codes, Data Compression - Fundamental Algorithms | CS 473, Study notes of Algorithms and Programming

Material Type: Notes; Class: Fundamental Algorithms; Subject: Computer Science; University: University of Illinois - Urbana-Champaign; Term: Spring 2008;

Typology: Study notes

Pre 2010

Uploaded on 03/16/2009

koofers-user-d4u
koofers-user-d4u 🇺🇸

10 documents

1 / 73

Toggle sidebar

Related documents


Partial preview of the text

Download Notes for Huffman Codes, Data Compression - Fundamental Algorithms | CS 473 and more Study notes Algorithms and Programming in PDF only on Docsity! CS 473ug: Algorithms Mahesh Viswanathan vmahesh@cs.uiuc.edu 3232 Siebel Center University of Illinois, Urbana-Champaign Spring 2008 Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes Part I Huffman Codes Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes Variable Length Codes Prefix Codes Data Compression Variable Length Codes Have different length encoding for each symbol Shorter encodings for more frequent symbols will reduce the average bits per symbol. Example Morse code is a variable length encoding. Maps e to 0 (dot), t to 1 (dash), a to 01 (dot-dash), . . . What is the text for 0101? Could be etet, or aa or eta or aet! Ambiguity removed by adding pauses between letters. But then encoding is not over 0,1 but over 0,1,2. Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes Variable Length Codes Prefix Codes Data Compression Variable Length Codes Have different length encoding for each symbol Shorter encodings for more frequent symbols will reduce the average bits per symbol. Example Morse code is a variable length encoding. Maps e to 0 (dot), t to 1 (dash), a to 01 (dot-dash), . . . What is the text for 0101? Could be etet, or aa or eta or aet! Ambiguity removed by adding pauses between letters. But then encoding is not over 0,1 but over 0,1,2. Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes Variable Length Codes Prefix Codes Data Compression Variable Length Codes Have different length encoding for each symbol Shorter encodings for more frequent symbols will reduce the average bits per symbol. Example Morse code is a variable length encoding. Maps e to 0 (dot), t to 1 (dash), a to 01 (dot-dash), . . . What is the text for 0101? Could be etet, or aa or eta or aet! Ambiguity removed by adding pauses between letters. But then encoding is not over 0,1 but over 0,1,2. Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes Variable Length Codes Prefix Codes Prefix Codes Definition A prefix code for a set S is function γ such that 1 For x ∈ S , γ(x) is a bit-string 2 For distinct x and y , it is not the case that γ(x) is a prefix of γ(y), or vice versa. Example Consider S = {a, b, c , d , e} with encoding γ as follows: γ(a) = 11 γ(b) = 01 γ(c) = 001 γ(d) = 10 γ(e) = 000 String “bad” encoded as 01 11 10 Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes Variable Length Codes Prefix Codes Decoding Prefix Codes Algorithm 1 Scan the bit sequence from left to right 2 When a prefix matches code of some symbol, output the symbol Justified since no shorter prefix, nor longer extension could encode a symbol Example S = {a, b, c , d , e}, with γ(a) = 11, γ(b) = 01, γ(c) = 001, γ(d) = 10, γ(e) = 000 0010000011101 c e c a b Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes Variable Length Codes Prefix Codes Decoding Prefix Codes Algorithm 1 Scan the bit sequence from left to right 2 When a prefix matches code of some symbol, output the symbol Justified since no shorter prefix, nor longer extension could encode a symbol Example S = {a, b, c , d , e}, with γ(a) = 11, γ(b) = 01, γ(c) = 001, γ(d) = 10, γ(e) = 000 0010000011101 c e c a b Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes Variable Length Codes Prefix Codes Decoding Prefix Codes Algorithm 1 Scan the bit sequence from left to right 2 When a prefix matches code of some symbol, output the symbol Justified since no shorter prefix, nor longer extension could encode a symbol Example S = {a, b, c , d , e}, with γ(a) = 11, γ(b) = 01, γ(c) = 001, γ(d) = 10, γ(e) = 000 0010000011101 c e c a b Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes Variable Length Codes Prefix Codes Decoding Prefix Codes Algorithm 1 Scan the bit sequence from left to right 2 When a prefix matches code of some symbol, output the symbol Justified since no shorter prefix, nor longer extension could encode a symbol Example S = {a, b, c , d , e}, with γ(a) = 11, γ(b) = 01, γ(c) = 001, γ(d) = 10, γ(e) = 000 0010000011101 c e c a b Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes Variable Length Codes Prefix Codes Decoding Prefix Codes Algorithm 1 Scan the bit sequence from left to right 2 When a prefix matches code of some symbol, output the symbol Justified since no shorter prefix, nor longer extension could encode a symbol Example S = {a, b, c , d , e}, with γ(a) = 11, γ(b) = 01, γ(c) = 001, γ(d) = 10, γ(e) = 000 0010000011101 c e c a b Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes Average Encoding Length Problem Defintion Average Bits per Letter Observation For a text of n letters, with x occuring nfx times, the length of the encoded text is ∑ x∈S nfx |γ(x)| = n ∑ x∈S fx |γ(x)|. Definition For an alphabet S , with frequency fx for symbol x ( ∑ x∈S fx = 1), the average number of bits required per letter under the encoding γ (denoted ABL(γ)) is ∑ x∈S fx |γ(x)|. Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes Average Encoding Length Problem Defintion ABL: Example Example For S = {a, b, c, d , e}, with frequencies fa = 0.32 fb = 0.25 fc = 0.20 fd = 0.18 fe = 0.05 Consider γ(a) = 11, γ(b) = 01, γ(c) = 001, γ(d) = 10, γ(e) = 000. ABL(γ) = 2×0.32 + 2×0.25 + 3×0.2 + 2×0.18 + 3×0.05 = 2.25 Consider γ′(a) = 11, γ′(b) = 10, γ′(c) = 01, γ′(d) = 001, γ′(e) = 000. Then ABL(γ′) = 2.23 Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes Average Encoding Length Problem Defintion ABL: Example Example For S = {a, b, c, d , e}, with frequencies fa = 0.32 fb = 0.25 fc = 0.20 fd = 0.18 fe = 0.05 Consider γ(a) = 11, γ(b) = 01, γ(c) = 001, γ(d) = 10, γ(e) = 000. ABL(γ) = 2×0.32 + 2×0.25 + 3×0.2 + 2×0.18 + 3×0.05 = 2.25 Consider γ′(a) = 11, γ′(b) = 10, γ′(c) = 01, γ′(d) = 001, γ′(e) = 000. Then ABL(γ′) = 2.23 Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes Prefix Codes and Binary Trees First Attempt Properties of Optimal Codes Optimal Codes and Full Trees Definition A binary tree is full if every internal node has two children. Proposition The binary tree corresponding to the optimal code is full. Proof. Suppose (for contradiction) T is optimal code, where u has only one child v Consider T ′ where u is removed; if u is the root make v root, otherwise, attach v to parent of u T ′ has a smaller average code, as the code of leaves below u has been shortened by 1 bit. Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes Prefix Codes and Binary Trees First Attempt Properties of Optimal Codes Optimal Codes and Full Trees Definition A binary tree is full if every internal node has two children. Proposition The binary tree corresponding to the optimal code is full. Proof. Suppose (for contradiction) T is optimal code, where u has only one child v Consider T ′ where u is removed; if u is the root make v root, otherwise, attach v to parent of u T ′ has a smaller average code, as the code of leaves below u has been shortened by 1 bit. Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes Prefix Codes and Binary Trees First Attempt Properties of Optimal Codes Optimal Codes and Full Trees Definition A binary tree is full if every internal node has two children. Proposition The binary tree corresponding to the optimal code is full. Proof. Suppose (for contradiction) T is optimal code, where u has only one child v Consider T ′ where u is removed; if u is the root make v root, otherwise, attach v to parent of u T ′ has a smaller average code, as the code of leaves below u has been shortened by 1 bit. Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes Prefix Codes and Binary Trees First Attempt Properties of Optimal Codes Optimal Codes and Full Trees Definition A binary tree is full if every internal node has two children. Proposition The binary tree corresponding to the optimal code is full. Proof. Suppose (for contradiction) T is optimal code, where u has only one child v Consider T ′ where u is removed; if u is the root make v root, otherwise, attach v to parent of u T ′ has a smaller average code, as the code of leaves below u has been shortened by 1 bit. Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes Prefix Codes and Binary Trees First Attempt Properties of Optimal Codes Top-Down Approach Algorithm [Shannon-Fano] 1 Divide S into S1 and S2 such that total frequency of S1 and S2 is (if possible) 1 2 2 Recursively find code for γ1 for S1 and γ2 for S2. 3 Code for S is γ(x) = 0γ1(x) if x ∈ S1 and γ(x) = 1γ2(x) if x ∈ S2 Example Consider S = {a, b, c, d , e} and fa = 0.32, fb = 0.25, fc = 0.2, fd = 0.18, fe = 0.05. First split results in {b, c , e} and {a, d} and recursively find codes. Resulting code is γ(a) = 11, γ(b) = 01, γ(c) = 001, γ(d) = 10, γ(e) = 000. γ not optimal; γ′ shown earlier is better. Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes Prefix Codes and Binary Trees First Attempt Properties of Optimal Codes Top-Down Approach Algorithm [Shannon-Fano] 1 Divide S into S1 and S2 such that total frequency of S1 and S2 is (if possible) 1 2 2 Recursively find code for γ1 for S1 and γ2 for S2. 3 Code for S is γ(x) = 0γ1(x) if x ∈ S1 and γ(x) = 1γ2(x) if x ∈ S2 Example Consider S = {a, b, c, d , e} and fa = 0.32, fb = 0.25, fc = 0.2, fd = 0.18, fe = 0.05. First split results in {b, c , e} and {a, d} and recursively find codes. Resulting code is γ(a) = 11, γ(b) = 01, γ(c) = 001, γ(d) = 10, γ(e) = 000. γ not optimal; γ′ shown earlier is better. Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes Prefix Codes and Binary Trees First Attempt Properties of Optimal Codes Depth and Frequency Proposition Let T ∗ be an optimal prefix code. For leaves u and v with labels x and y, respectively, if depth(u) < depth(v) then fx ≥ fy . Proof. Suppose (for contradiction) fx < fy Consider tree T ∗1 where the labels of leaves u and v have been exchanged. ABL(T ∗)−ABL(T ∗1 ) = ∑ z∈S fzdepthT∗(z)− ∑ z∈S fzdepthT∗1 (z) = (depth(u)fx + depth(v)fy ) −(depth(u)fy + depth(v)fx) = (depth(v)− depth(u))(fy − fx) > 0 T ∗1 is better, which contradicts optimality of T ∗ Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes Prefix Codes and Binary Trees First Attempt Properties of Optimal Codes Depth and Frequency Proposition Let T ∗ be an optimal prefix code. For leaves u and v with labels x and y, respectively, if depth(u) < depth(v) then fx ≥ fy . Proof. Suppose (for contradiction) fx < fy Consider tree T ∗1 where the labels of leaves u and v have been exchanged. ABL(T ∗)−ABL(T ∗1 ) = ∑ z∈S fzdepthT∗(z)− ∑ z∈S fzdepthT∗1 (z) = (depth(u)fx + depth(v)fy ) −(depth(u)fy + depth(v)fx) = (depth(v)− depth(u))(fy − fx) > 0 T ∗1 is better, which contradicts optimality of T ∗ Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes Prefix Codes and Binary Trees First Attempt Properties of Optimal Codes Depth and Frequency Proposition Let T ∗ be an optimal prefix code. For leaves u and v with labels x and y, respectively, if depth(u) < depth(v) then fx ≥ fy . Proof. Suppose (for contradiction) fx < fy Consider tree T ∗1 where the labels of leaves u and v have been exchanged. ABL(T ∗)−ABL(T ∗1 ) = ∑ z∈S fzdepthT∗(z)− ∑ z∈S fzdepthT∗1 (z) = (depth(u)fx + depth(v)fy ) −(depth(u)fy + depth(v)fx) = (depth(v)− depth(u))(fy − fx) > 0 T ∗1 is better, which contradicts optimality of T ∗ Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes Prefix Codes and Binary Trees First Attempt Properties of Optimal Codes Depth and Frequency Proposition Let T ∗ be an optimal prefix code. For leaves u and v with labels x and y, respectively, if depth(u) < depth(v) then fx ≥ fy . Proof. Suppose (for contradiction) fx < fy Consider tree T ∗1 where the labels of leaves u and v have been exchanged. ABL(T ∗)−ABL(T ∗1 ) = ∑ z∈S fzdepthT∗(z)− ∑ z∈S fzdepthT∗1 (z) = (depth(u)fx + depth(v)fy ) −(depth(u)fy + depth(v)fx) = (depth(v)− depth(u))(fy − fx) > 0 T ∗1 is better, which contradicts optimality of T ∗ Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes Prefix Codes and Binary Trees First Attempt Properties of Optimal Codes Maximum Depth Corollary Least frequent symbol labels the leaf of maximum depth. Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes Prefix Codes and Binary Trees First Attempt Properties of Optimal Codes Technical Observation Proposition Let x and y be the two least frequent elements. Then there is an optimal code T ∗ such that x , y are siblings. Proof. Let u be leaf of maximum depth in T ∗. Let w be u’ parent. Since T ∗ is full, w can another child v v is a leaf because u is at maximum depth The two lowest frequent elements must label leaves u and v at maximum depth. Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes Prefix Codes and Binary Trees First Attempt Properties of Optimal Codes Technical Observation Proposition Let x and y be the two least frequent elements. Then there is an optimal code T ∗ such that x , y are siblings. Proof. Let u be leaf of maximum depth in T ∗. Let w be u’ parent. Since T ∗ is full, w can another child v v is a leaf because u is at maximum depth The two lowest frequent elements must label leaves u and v at maximum depth. Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes Prefix Codes and Binary Trees First Attempt Properties of Optimal Codes Technical Observation Proposition Let x and y be the two least frequent elements. Then there is an optimal code T ∗ such that x , y are siblings. Proof. Let u be leaf of maximum depth in T ∗. Let w be u’ parent. Since T ∗ is full, w can another child v v is a leaf because u is at maximum depth The two lowest frequent elements must label leaves u and v at maximum depth. Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes Prefix Codes and Binary Trees First Attempt Properties of Optimal Codes Technical Observation Proposition Let x and y be the two least frequent elements. Then there is an optimal code T ∗ such that x , y are siblings. Proof. Let u be leaf of maximum depth in T ∗. Let w be u’ parent. Since T ∗ is full, w can another child v v is a leaf because u is at maximum depth The two lowest frequent elements must label leaves u and v at maximum depth. Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes The Algorithm Correctness Implementation Example S = {a, b, c, d , e} and fa = 0.32, fb = 0.25, fc = 0.2, fd = 0.18, fe = 0.05 e d c b a S = {a, b, c , ω1} and fa = 0.32, fb = 0.25, fc = 0.2, fω1 = 0.23 ω1 c b a S = {a, b, ω2} and fa = 0.32, fb = 0.25, fω2 = 0.43 ω2 b a S = {ω2, ω3} and fω2 = 0.43, fω3 = 0.57 ω2 ω3 Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes The Algorithm Correctness Implementation Example S = {a, b, c, d , e} and fa = 0.32, fb = 0.25, fc = 0.2, fd = 0.18, fe = 0.05 e d c b a S = {a, b, c , ω1} and fa = 0.32, fb = 0.25, fc = 0.2, fω1 = 0.23 ω1 c b a S = {a, b, ω2} and fa = 0.32, fb = 0.25, fω2 = 0.43 ω2 b a S = {ω2, ω3} and fω2 = 0.43, fω3 = 0.57 ω2 ω3 Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes The Algorithm Correctness Implementation Example S = {a, b, c, d , e} and fa = 0.32, fb = 0.25, fc = 0.2, fd = 0.18, fe = 0.05 e d c b a S = {a, b, c , ω1} and fa = 0.32, fb = 0.25, fc = 0.2, fω1 = 0.23 ω1 c b a S = {a, b, ω2} and fa = 0.32, fb = 0.25, fω2 = 0.43 ω2 b a S = {ω2, ω3} and fω2 = 0.43, fω3 = 0.57 ω2 ω3 Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes The Algorithm Correctness Implementation Example S = {a, b, c, d , e} and fa = 0.32, fb = 0.25, fc = 0.2, fd = 0.18, fe = 0.05 e d c b a S = {a, b, c , ω1} and fa = 0.32, fb = 0.25, fc = 0.2, fω1 = 0.23 ω1 c b a S = {a, b, ω2} and fa = 0.32, fb = 0.25, fω2 = 0.43 ω2 b a S = {ω2, ω3} and fω2 = 0.43, fω3 = 0.57 ω2 ω3 Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes The Algorithm Correctness Implementation Example S = {a, b, c, d , e} and fa = 0.32, fb = 0.25, fc = 0.2, fd = 0.18, fe = 0.05 e d c b a S = {a, b, c , ω1} and fa = 0.32, fb = 0.25, fc = 0.2, fω1 = 0.23 ω1 c b a S = {a, b, ω2} and fa = 0.32, fb = 0.25, fω2 = 0.43 ω2 b a S = {ω2, ω3} and fω2 = 0.43, fω3 = 0.57 ω2 ω3 Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes The Algorithm Correctness Implementation Property about Recursive Step Lemma Let S ′ = (S \ {x , y}) ∪ {ω}, T ′ be the huffman code for S ′ and T the huffman code for S. Then, ABL(T ′) = ABL(T )− fω Proof. depth(z) for z 6= x , y is the same in both T and T ′. depthT (x) = depthT (y) = depthT ′(ω) + 1 and fω = fx + fy ABL(T ) = ∑ z∈S fzdepthT (z) = fxdepthT (x) + fydepthT (y) + ∑ z 6=x,y fzdepthT (z) = (fx + fy )(1 + depthT ′(ω)) + ∑ z 6=x,y fzdepthT ′(z) = fω + fωdepthT ′(ω) + ∑ z 6=x,y fzdepthT ′(z) = fω + ABL(T ′) Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes The Algorithm Correctness Implementation Property about Recursive Step Lemma Let S ′ = (S \ {x , y}) ∪ {ω}, T ′ be the huffman code for S ′ and T the huffman code for S. Then, ABL(T ′) = ABL(T )− fω Proof. depth(z) for z 6= x , y is the same in both T and T ′. depthT (x) = depthT (y) = depthT ′(ω) + 1 and fω = fx + fy ABL(T ) = ∑ z∈S fzdepthT (z) = fxdepthT (x) + fydepthT (y) + ∑ z 6=x,y fzdepthT (z) = (fx + fy )(1 + depthT ′(ω)) + ∑ z 6=x,y fzdepthT ′(z) = fω + fωdepthT ′(ω) + ∑ z 6=x,y fzdepthT ′(z) = fω + ABL(T ′) Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes The Algorithm Correctness Implementation Property about Recursive Step Lemma Let S ′ = (S \ {x , y}) ∪ {ω}, T ′ be the huffman code for S ′ and T the huffman code for S. Then, ABL(T ′) = ABL(T )− fω Proof. depth(z) for z 6= x , y is the same in both T and T ′. depthT (x) = depthT (y) = depthT ′(ω) + 1 and fω = fx + fy ABL(T ) = ∑ z∈S fzdepthT (z) = fxdepthT (x) + fydepthT (y) + ∑ z 6=x,y fzdepthT (z) = (fx + fy )(1 + depthT ′(ω)) + ∑ z 6=x,y fzdepthT ′(z) = fω + fωdepthT ′(ω) + ∑ z 6=x,y fzdepthT ′(z) = fω + ABL(T ′) Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes The Algorithm Correctness Implementation Optimality Proof Theorem The Huffman code is optimal prefix code. Proof. Huffman code is optimal when |S | = 2 Assume Huffman code is optimal for S when |S | < k Let Huffman code T for S (|S | = k) be less optimal than Z Let x , y be the least frequent elements. x , y are siblings in Z Construct Z ′ by removing leaves x , y ; Z ′ is code for S ′ as is T ′ ABL(Z ′) = ABL(Z )− fω and ABL(T ′) = ABL(T )− fω ABL(Z ′) < ABL(T ′) which contradicts optimality of T ′ Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes The Algorithm Correctness Implementation Optimality Proof Theorem The Huffman code is optimal prefix code. Proof. Huffman code is optimal when |S | = 2 Assume Huffman code is optimal for S when |S | < k Let Huffman code T for S (|S | = k) be less optimal than Z Let x , y be the least frequent elements. x , y are siblings in Z Construct Z ′ by removing leaves x , y ; Z ′ is code for S ′ as is T ′ ABL(Z ′) = ABL(Z )− fω and ABL(T ′) = ABL(T )− fω ABL(Z ′) < ABL(T ′) which contradicts optimality of T ′ Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes The Algorithm Correctness Implementation Optimality Proof Theorem The Huffman code is optimal prefix code. Proof. Huffman code is optimal when |S | = 2 Assume Huffman code is optimal for S when |S | < k Let Huffman code T for S (|S | = k) be less optimal than Z Let x , y be the least frequent elements. x , y are siblings in Z Construct Z ′ by removing leaves x , y ; Z ′ is code for S ′ as is T ′ ABL(Z ′) = ABL(Z )− fω and ABL(T ′) = ABL(T )− fω ABL(Z ′) < ABL(T ′) which contradicts optimality of T ′ Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes The Algorithm Correctness Implementation Optimality Proof Theorem The Huffman code is optimal prefix code. Proof. Huffman code is optimal when |S | = 2 Assume Huffman code is optimal for S when |S | < k Let Huffman code T for S (|S | = k) be less optimal than Z Let x , y be the least frequent elements. x , y are siblings in Z Construct Z ′ by removing leaves x , y ; Z ′ is code for S ′ as is T ′ ABL(Z ′) = ABL(Z )− fω and ABL(T ′) = ABL(T )− fω ABL(Z ′) < ABL(T ′) which contradicts optimality of T ′ Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes The Algorithm Correctness Implementation Optimality Proof Theorem The Huffman code is optimal prefix code. Proof. Huffman code is optimal when |S | = 2 Assume Huffman code is optimal for S when |S | < k Let Huffman code T for S (|S | = k) be less optimal than Z Let x , y be the least frequent elements. x , y are siblings in Z Construct Z ′ by removing leaves x , y ; Z ′ is code for S ′ as is T ′ ABL(Z ′) = ABL(Z )− fω and ABL(T ′) = ABL(T )− fω ABL(Z ′) < ABL(T ′) which contradicts optimality of T ′ Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes The Algorithm Correctness Implementation Implementation and Analysis if S has two letters then encode one as 0 and the other as 1 else let x,y be the lowest frequecy letters remove x,y and add ω to get S’ recursively find code T’ for S’ code T for S is as follows for z =/= x,y T(z) = T’(z) T(x) = 0T’(ω) and T(y) = 1T’(ω) Store S in a priority queue with the frequency as key Each iteration takes O(log n) Total time is O(n log n) Viswanathan CS473ug Codes The Problem Towards a Solution Huffman Codes The Algorithm Correctness Implementation Implementation and Analysis if S has two letters then encode one as 0 and the other as 1 else let x,y be the lowest frequecy letters remove x,y and add ω to get S’ recursively find code T’ for S’ code T for S is as follows for z =/= x,y T(z) = T’(z) T(x) = 0T’(ω) and T(y) = 1T’(ω) Store S in a priority queue with the frequency as key Each iteration takes O(log n) Total time is O(n log n) Viswanathan CS473ug
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved