Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Data Compression: Trie and Huffman Encoding - Prof. Sugih Jamin, Study notes of Algorithms and Programming

The concepts of trie and huffman encoding for data compression. Trie is a tree-like data structure used for associating keys with edges instead of nodes. Huffman encoding is a variable-length prefix code using a prefix tree (huffman tree) for entropy encoding. The document also discusses the implementation and time complexity of huffman encoding.

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-rc6
koofers-user-rc6 🇺🇸

10 documents

1 / 14

Toggle sidebar

Related documents


Partial preview of the text

Download Data Compression: Trie and Huffman Encoding - Prof. Sugih Jamin and more Study notes Algorithms and Programming in PDF only on Docsity! Outline PA1 Past Due Last time: • Binary Search Tree (BST) • Binary Space Partition (BSP) Tree • MinHeap and MaxHeap • Priority queue Today: • Trie • RLE and Huffman encoding Sugih Jamin (jamin@eecs.umich.edu) Trie trie: from retrieval, rhymes with try, to differentiate from tree A trie: a tree that uses parts of the key, as opposed to the whole key, to perform search Whereas a tree associates keys with nodes, a trie associates keys with edges Example: for set of strings {on, owe, owl, tip, to} T: o n w e l t o p i Note: the handout’s external node is our leaf node, not our external (null) node Sugih Jamin (jamin@eecs.umich.edu) Run Length Encoding A very simple encoding method: for each repeated element, output the element and the count Example: 11111111000001111000011110000011111110000 Output: 8150414041507140 Sugih Jamin (jamin@eecs.umich.edu) Huffman Encoding Example string: If a woodchuck could chuck wood! ASCII encoding: 8 bits/char ⇒ requires 256 bits to encode (store in binary) the string Observe: there are only 13 distinct symbols in example string, so 4 bits/char is sufficient to encode the string ⇒ requires 128 bits Huffman encoding’s main ideas: • variable-length encoding: use different number of bits (code length) to represent different symbol • entropy encoding: assign smaller code to more frequently occuring symbol Goal: Σl(c) · f(c) minimized, where c is each unique symbol in string, l(c) length of its code, and f(c) its frequency Sugih Jamin (jamin@eecs.umich.edu) Huffman Encoding (contd) If a woodchuck could chuck wood! Can be encoded using the following codes (for example): symbol c freq. f(c) code C(c) I 1 11111 f 1 11110 a 1 11101 l 1 11100 ! 1 1101 w 2 1100 d 3 101 u 3 100 h 2 0111 k 2 0110 o 5 010 c 5 001 ’ ’ 5 000 Takes only 111 bits to encode the string: Sugih Jamin (jamin@eecs.umich.edu) Huffman Tree Construction (contd) Characteristics of Huffman trees: • higher frequency symbols at shallower depth • since all symbols are leaf nodes, no code is a prefix of another Construct Huffman tree from the |Σ| elements (|Σ|: alphabet size): • implement as a MinHeap, where the “key” is the frequency of occurrence of each element of Σ • take the two smallest elements off MinHeap, O( ) • make a tree of them, with the key of the new root node being the sum of the keys of the two children, O( ) • put new tree back into MinHeap, O( ) Total construction time: O( ) Sugih Jamin (jamin@eecs.umich.edu) Encoding Time Complexity Running times, n string length, |Σ| alphabet size • frequency count: O(n) • Huffman tree construction: O(|Σ| log |Σ|) • Total time: O(n + |Σ| log |Σ|) For binary data, treat each byte as a “character” Sugih Jamin (jamin@eecs.umich.edu) Compressing the Huffman Code Table The Huffman code for any particular text is not unique For example, the following are all acceptable: symbol c freq. f(c) code C(c) C ′(c) C ′′(c) ’ ’ 5 000 001 000 c 5 001 010 001 d 3 101 100 010 o 5 010 000 011 u 3 100 101 100 ! 2 1101 0110 1010 h 2 0111 1101 1011 k 2 0110 1110 1100 w 1 1100 0111 1101 a 1 11101 11100 11100 f 1 11110 11101 11101 l 1 11100 11111 11110 I 1 11111 11110 11111 The last column can be compressed into: 3’ ’cdou4!hkw5aflI Sugih Jamin (jamin@eecs.umich.edu)
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved