Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Huffman Codes: A Prefix Code for Data Compression, Study notes of Data Compression

Information TheoryData CompressionCoding Theory

Huffman codes are a prefix code used for data compression, typically saving 20%-90% of the bits required to represent data compared to fixed-length codes. The basic idea is to represent frequently encountered characters by shorter binary codes. the concept of Huffman codes, including examples and the optimization of the code tree.

What you will learn

  • How does Huffman coding save bits compared to fixed-length codes?
  • What are the advantages of using Huffman codes for data compression?
  • What is the basic idea behind Huffman codes?
  • How is a Huffman code tree optimized?
  • What is a prefix code?

Typology: Study notes

2021/2022

Uploaded on 09/07/2022

adnan_95
adnan_95 🇮🇶

4.3

(38)

921 documents

1 / 36

Toggle sidebar

Related documents


Partial preview of the text

Download Huffman Codes: A Prefix Code for Data Compression and more Study notes Data Compression in PDF only on Docsity! Huffman codes I used for data compression, typically saving 20%–90% I Basic idea: represent often encountered characters by shorter (binary) codes 1 / 11 Huffman codes I used for data compression, typically saving 20%–90% I Basic idea: represent often encountered characters by shorter (binary) codes 1 / 11 Huffman codes Example I Suppose we have the following data file with total 100 characters: Char. a b c d e f Freq. 45 13 12 16 9 5 3-bit fixed length code 000 001 010 011 100 101 variable length code 0 101 100 111 1101 1100 I Total number of bits required to encode the file: I Fixed-length code: 100 × 3 = 300 I Variable-length code: 1·45 +3·13 + 3·12 + 3·16 + 4·9 + 4·5 = 225 I Variable-length code saves 25%. 2 / 11 Huffman codes Example I Suppose we have the following data file with total 100 characters: Char. a b c d e f Freq. 45 13 12 16 9 5 3-bit fixed length code 000 001 010 011 100 101 variable length code 0 101 100 111 1101 1100 I Total number of bits required to encode the file: I Fixed-length code: 100 × 3 = 300 I Variable-length code: 1·45 +3·13 + 3·12 + 3·16 + 4·9 + 4·5 = 225 I Variable-length code saves 25%. 2 / 11 Huffman codes Prefix(-free) codes: 1. No codeword is also a prefix1 of some other code. 2. A prefix code for Example: Char. a b c d e f Code 0 101 100 111 1101 1100 3. Encoding and decoding with a prefix code. 4. Example, cont’d. I Encode: I beef −→ 101110111011100 I face −→ 110001001101 I Decode: I 101110111011100 −→ beef I 110001001101 −→ face 1a word, letter or number placed before another 3 / 11 Huffman codes Prefix(-free) codes: 1. No codeword is also a prefix1 of some other code. 2. A prefix code for Example: Char. a b c d e f Code 0 101 100 111 1101 1100 3. Encoding and decoding with a prefix code. 4. Example, cont’d. I Encode: I beef −→ 101110111011100 I face −→ 110001001101 I Decode: I 101110111011100 −→ beef I 110001001101 −→ face 1a word, letter or number placed before another 3 / 11 Huffman codes Prefix(-free) codes: 1. No codeword is also a prefix1 of some other code. 2. A prefix code for Example: Char. a b c d e f Code 0 101 100 111 1101 1100 3. Encoding and decoding with a prefix code. 4. Example, cont’d. I Encode: I beef −→ 101110111011100 I face −→ 110001001101 I Decode: I 101110111011100 −→ beef I 110001001101 −→ face 1a word, letter or number placed before another 3 / 11 Huffman codes 5. Representation of prefix code: I full binary tree: every nonleaf node has two children. I All legal codes are at the leaves, since no prefix is shared 6. Example, cont’d (a) the (not-full-binary) tree corresponding to the fixed-legnth code, (b) the (full-binary) tree corresponding to the prefix code: 7. Fact: an optimal code for a file is always represented by a full binary tree. 4 / 11 Huffman codes 5. Representation of prefix code: I full binary tree: every nonleaf node has two children. I All legal codes are at the leaves, since no prefix is shared 6. Example, cont’d (a) the (not-full-binary) tree corresponding to the fixed-legnth code, (b) the (full-binary) tree corresponding to the prefix code: 7. Fact: an optimal code for a file is always represented by a full binary tree. 4 / 11 Huffman codes Cost and optimality Let C = alphabet (set of characters), then I A code = a binary tree T I For each character c ∈ C, define f(c) = frequency of c in the file dT (c) = length of the code for c = number of bits = depth of c’ leave in the tree T Then the number of bits (“cost of the tree/code T”) required to encode the file B(T ) = ∑ c∈C f(c) · dT (c), I A code T is optimal if B(T ) is minimal. 5 / 11 Huffman codes Cost and optimality Let C = alphabet (set of characters), then I A code = a binary tree T I For each character c ∈ C, define f(c) = frequency of c in the file dT (c) = length of the code for c = number of bits = depth of c’ leave in the tree T Then the number of bits (“cost of the tree/code T”) required to encode the file B(T ) = ∑ c∈C f(c) · dT (c), I A code T is optimal if B(T ) is minimal. 5 / 11 Huffman codes Cost and optimality Let C = alphabet (set of characters), then I A code = a binary tree T I For each character c ∈ C, define f(c) = frequency of c in the file dT (c) = length of the code for c = number of bits = depth of c’ leave in the tree T Then the number of bits (“cost of the tree/code T”) required to encode the file B(T ) = ∑ c∈C f(c) · dT (c), I A code T is optimal if B(T ) is minimal. 5 / 11 Huffman codes Cost and optimality Let C = alphabet (set of characters), then I A code = a binary tree T I For each character c ∈ C, define f(c) = frequency of c in the file dT (c) = length of the code for c = number of bits = depth of c’ leave in the tree T Then the number of bits (“cost of the tree/code T”) required to encode the file B(T ) = ∑ c∈C f(c) · dT (c), I A code T is optimal if B(T ) is minimal. 5 / 11 Huffman codes Let C = alphabet (set of characters), basic idea of Huffman codes to produce a prefix code for C: represent often encountered characters by shorter (binary) codes via 1. Building a full binary tree T in a bottom-up manner 2. Beginning with |C| leaves, performs a sequence of |C| − 1 “merging” operations to create T 3. “Merging” operation is greedy: the two with lowest frequencies are merged. 6 / 11 Huffman codes Let C = alphabet (set of characters), basic idea of Huffman codes to produce a prefix code for C: represent often encountered characters by shorter (binary) codes via 1. Building a full binary tree T in a bottom-up manner 2. Beginning with |C| leaves, performs a sequence of |C| − 1 “merging” operations to create T 3. “Merging” operation is greedy: the two with lowest frequencies are merged. 6 / 11 Review: priority queue I A priority queue is a data structure for maintaining a set S of elements, each with an associated key. I A min-priority queue supports the following operations: I Insert(S,x): inserts the element x into the set S, i.e., S = S ∪ {x}. I Minimum(S): returns the element of S with the smallest “key”. I ExtractMin(S): removes and returns the element of S with the smallest “key”. I DecreaseKey(S,x,k): decreases the value of element x’s key to the new value k, which is assumed to be at least as small as x’s current key value. I A max-priority queue supports the operations: Insert(S, x), Maximum(S), ExtractMax(S), IncreaseKey(S, x, k). I Section 6.5 describes a binary heap implementation. I Cost: let n = |S|, then I initialization building heap = O(n) I each heap operation = O(lgn) 7 / 11 Review: priority queue I A priority queue is a data structure for maintaining a set S of elements, each with an associated key. I A min-priority queue supports the following operations: I Insert(S,x): inserts the element x into the set S, i.e., S = S ∪ {x}. I Minimum(S): returns the element of S with the smallest “key”. I ExtractMin(S): removes and returns the element of S with the smallest “key”. I DecreaseKey(S,x,k): decreases the value of element x’s key to the new value k, which is assumed to be at least as small as x’s current key value. I A max-priority queue supports the operations: Insert(S, x), Maximum(S), ExtractMax(S), IncreaseKey(S, x, k). I Section 6.5 describes a binary heap implementation. I Cost: let n = |S|, then I initialization building heap = O(n) I each heap operation = O(lgn) 7 / 11 Huffman codes I Pseudocode: Huffmancode(C) n = |C| Q = C // min-priority queue, keyed by freq attribute for i = 1 to n-1 allocate a new node z z_left = x = ExtractMin(Q) z_right = y = ExtractMin(Q) freq[z] = freq[x] + freq[y] Insert(Q,z) endfor return ExtractMin(Q) // the root of the tree I Running time: T (n) = init. Heap + (n− 1) loop× each Heap op. = O(n) +O(n lg n) = O(n lg n) 8 / 11 Huffman codes I Pseudocode: Huffmancode(C) n = |C| Q = C // min-priority queue, keyed by freq attribute for i = 1 to n-1 allocate a new node z z_left = x = ExtractMin(Q) z_right = y = ExtractMin(Q) freq[z] = freq[x] + freq[y] Insert(Q,z) endfor return ExtractMin(Q) // the root of the tree I Running time: T (n) = init. Heap + (n− 1) loop× each Heap op. = O(n) +O(n lg n) = O(n lg n) 8 / 11 Huffman codes Example Tes] (bala, [ets [a45] ®) Huffman codes Optimality: To prove the greedy algorithm Huffmancode producing an optimal prefix code, we show that it exhibits the following two ingradients: 1. The greedy-choice property If x, y ∈ C having the lowest frequencies, then there exists an optimal code T such that I dT (x) = dT (y) I the codes for x and y differ only in the last bit 2. The optimal substructure property If x, y ∈ C have the lowest frequencies, and let z be their parent. Then the tree T ′ = T − {x, y} represents an optimal prefix code for the alphabet C ′ = (C − {x, y}) ∪ {z}. 10 / 11 Huffman codes By the above two properties, after each greedy choice is made, we are left with an optimization problem of the same form as the original. By induction, we have Theorem. Huffman code is an optimal prefix code. 11 / 11
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved