Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Huffman Code, Study notes of Data Structures and Algorithms

Coding: Assigning binary codewords to. (blocks of) source symbols. □ Variable-length codes (VLC). □ Tree codes (prefix code) are instantaneous.

Typology: Study notes

2021/2022

Uploaded on 09/07/2022

adnan_95
adnan_95 🇮🇶

4.3

(38)

921 documents

1 / 24

Toggle sidebar

Related documents


Partial preview of the text

Download Huffman Code and more Study notes Data Structures and Algorithms in PDF only on Docsity! Lecture 6: Huffman Code Thinh Nguyen Oregon State University Review Coding: Assigning binary codewords to (blocks of) source symbols. Variable-length codes (VLC) Tree codes (prefix code) are instantaneous. In practice… Use some nice algorithm to find the codes Huffman coding Tunnstall coding Golomb coding Huffman Average Code Length Input: Probabilities p1, p2, ... , pm for symbols a1, a2, ... ,am, respectively. Output: A tree that minimizes the average number of bits (bit rate) to code a symbol. That is, minimizes i m i ilpl ∑ = − = 1 Where li is the length of codeword ai Huffman Coding Two-step algorithm: 1. Iterate: – Merge the least probable symbols. – Sort. 2. Assign bits. a d b c 0.5 0.25 0.125 0.125 0.5 0.25 0.5 0.25 0.5 Merge Sort Assign 0 1 0 10 11 0 10 110 111 Get code More Examples of Huffman Code . A 6 fy. A, d Cc More Examples of Huffman Code Average Huffman Code Length Pnoaonr average number of bits per symbol is Ax1+.1K44.39xK2+.1x%34 .1x4= 2.4 1110 10 110 1111 Condition 2: The two least probable letters have codewords with the same maximum length lm. Easy to see why? Proof by contradiction: Suppose we have an optimal code X in which, two codewords with lowest probabilities ci and cj and that ci is longer than cj by k bits. Then because this is a prefix code, cj cannot be the prefix to cj. So, we can drop the last k bits of ci. We also guarantee that by dropping the last k bits of ci, we still have a decodable codeword. This is because ci and cj have the longest length (least probable codes), hence they cannot be the prefix of any other code. By dropping the k bits of ci , we create a new code Y which has shorter average length, hence contradiction is reached. Condition 3: In the tree corresponding to the optimum code, there must be two branches stemming from each intermediate node.. Easy to see why? If there were any intermediate node with only one branch coming from that node, we could remove it without affecting the decodability of the code while reducing its average length. 0 1 0 0 1 a b c a: 000 b: 001 c: 1 0 1 0 0 1 a b c a: 00 b: 01 c: 1 Condition 4: Suppose we change an intermediate node into a leaf node by combining all the leaves descending from it into a composite word of a reduced alphabet. Then if the orginal tree was optimal for the original alphabet, the reduced tree is optimal for the reduced alphabet. 0 1 0 0 1 a b d a: 000 b: 001 c: 01 d:1 c 1 0 1 0 e d c 1 e: 00 c: 01 d:1 Extended Huffman Code nSHlSH /1)()( +<≤ − source theofEntropy :)( CodeHuffman oflength Average : SH l − Proof: page 53 of the book }...,...,...,...{},,...{ 211 n times 1112,1 mmm n m aaaaaaaaaAaaaA 43421 == alphabet A in the symbols nnm Huffman Coding: Pros and Cons + Fast implementations. + Error resilient: resynchronizes in ~ l2 steps. - The code tree grows exponentially when the source is extended. - The symbol probabilities are built-in in the code. Hard to use Huffman coding for extended sources / large alphabets or when the symbol probabilities are varying by time. Huffman Coding of 16-bit CD-quality audio 1.15349,30013.8402,442Folk rock (Cohn) 1.30725,42012.8939,862Mozart symphony Compression Ratio Compressed File Size (bytes) Entropy (bits)Original file size (bytes) Filename 1.54261,59010.4402,442Folk rock (Cohn) 1.65569,7929.7939,862Mozart symphony Compression Ratio Compressed File Size (bytes) Entropy (bits)Original file size (bytes) Filename Huffman coding of the Differences
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved