Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Text Compression: Understanding Huffman Coding, Lecture notes of Data Structures and Algorithms

An introduction to text compression using Huffman coding. It covers the basics of Huffman coding, including the encoding process, building a Huffman tree, and decoding a message. The document also touches on the history of Huffman coding and its complexities.

Typology: Lecture notes

2021/2022

Uploaded on 09/07/2022

adnan_95
adnan_95 🇮🇶

4.3

(38)

921 documents

1 / 8

Toggle sidebar

Related documents


Partial preview of the text

Download Text Compression: Understanding Huffman Coding and more Lecture notes Data Structures and Algorithms in PDF only on Docsity! CPS 100 8.1 Text Compression: Examples 00101101100100d 0101001100011c 1010001100101e 1100101100010b 00000001100001a Var. length Fixed length ASCIISymbol “abcde” in the different formats ASCII: 01100001011000100110001101100100… Fixed: 000001010011100 Var: 000110100110 0 0 0 0 0 00 1 1 1 1 a b c d e a d bc e 0 0 0 0 1 1 1 1 Encodings ASCII: 8 bits/character Unicode: 16 bits/character CPS 100 8.2 Huffman coding: go go gophers  Encoding uses tree:  0 left/1 right  How many bits? 37!!  Savings? Worth it? ASCII 3 bits Huffman g 103 1100111 000 00 o 111 1101111 001 01 p 112 1110000 010 1100 h 104 1101000 011 1101 e 101 1100101 100 1110 r 114 1110010 101 1111 s 115 1110011 110 101 sp. 32 1000000 111 101 3 s 1 * 2 2 p 1 h 1 2 e 1 r 1 4 g 3 o 3 6 3 2 p 1 h 1 2 e 1 r 1 4 s 1 * 2 7 g 3 o 3 6 13 CPS 100 8.3 Huffman Coding  D.A Huffman in early 1950’s  Before compressing data, analyze the input stream  Represent data using variable length codes  Variable length codes though Prefix codes  Each letter is assigned a codeword  Codeword is for a given letter is produced by traversing the Huffman tree  Property: No codeword produced is the prefix of another  Letters appearing frequently have short codewords, while those that appear rarely have longer ones  Huffman coding is optimal per-character coding method CPS 100 8.4 Building a Huffman tree  Begin with a forest of single-node trees (leaves)  Each node/tree/leaf is weighted with character count  Node stores two values: character and count  There are n nodes in forest, n is size of alphabet?  Repeat until there is only one node left: root of tree  Remove two minimally weighted trees from forest  Create new tree with minimal trees as children, • New tree root's weight: sum of children (character ignored)  Does this process terminate? How do we get minimal trees?  Remove minimal trees, hummm…… CPS 100 8.5 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 11 6 I 5 E 5 N 1 C 1 F 1 P 2 U 2 R 2 L 2 D 2 G 3 T 3 O 3 B 3 A 4 M 4 S CPS 100 8.6 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 11 6 I 5 E 5 N 1 C 1 F 1 P 2 U 2 R 2 L 2 D 2 G 3 T 3 O 3 B 3 A 4 M 4 S 2 2 CPS 100 8.7 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 11 6 I 5 E 5 N 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 T 3 O 3 B 3 A 4 M 4 S 2 2 3 3 CPS 100 8.8 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 11 6 I 5 E 5 N 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 T 3 O 3 B 3 A 4 M 4 S 2 2 3 3 4 4 CPS 100 8.17 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 2 3 445 68 6 8 16 10 21 11 12 2337 60 100000100001001101 CPS 100 8.18 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 2 3 445 68 6 8 16 10 21 11 12 2337 60 00000100001001101 CPS 100 8.19 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 2 3 445 68 6 8 16 10 21 11 12 2337 60 0000100001001101 G CPS 100 8.20 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 2 3 445 68 6 8 16 10 21 11 12 2337 60 1 GOOD CPS 100 8.21 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S 2 3 445 68 6 8 16 10 21 11 12 2337 60 01100000100001001101 GOOD CPS 100 8.22 Huffman coding: go go gophers  choose two smallest weights  combine nodes + weights  Repeat  Priority queue?  Encoding uses tree:  0 left/1 right  How many bits? ASCII 3 bits Huffman g 103 1100111 000 ?? o 111 1101111 001 ?? p 112 1110000 010 h 104 1101000 011 e 101 1100101 100 r 114 1110010 101 s 115 1110011 110 sp. 32 1000000 111 g o e r s * 3 3 h 1 211 1 2 p 1 h 1 2 e 1 r 1 3 s 1 * 2 2 p 1 h 1 2 e 1 r 1 4 g 3 o 3 6 1 p CPS 100 8.23 Huffman Tree 2  “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”  E.g. “ A SIMPLE” ⇔ “10101101001000101001110011100000” CPS 100 8.24 Huffman Tree 2  “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”  E.g. “ A SIMPLE” ⇔ “10101101001000101001110011100000” CPS 100 8.25 Huffman Tree 2  “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”  E.g. “ A SIMPLE” ⇔ “10101101001000101001110011100000” CPS 100 8.26 Huffman Tree 2  “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”  E.g. “ A SIMPLE” ⇔ “10101101001000101001110011100000” CPS 100 8.27 Huffman Tree 2  “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”  E.g. “ A SIMPLE” ⇔ “10101101001000101001110011100000” CPS 100 8.28 Huffman Tree 2  “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”  E.g. “ A SIMPLE” ⇔ “10101101001000101001110011100000”
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved