Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Huffman Codes: Data Structures and Algorithms Project 2 - Prof. David J. Galles, Study Guides, Projects, Research of Data Structures and Algorithms

Instructions for project 2 of the cs245-2009s data structures and algorithms course at the university of san francisco. The project focuses on huffman codes, a method for compressing data by assigning variable-length codes to characters based on their frequency. Topics such as ascii codes, representing codes as trees, prefix codes, variable length codes, file length, decoding files, file compression, and huffman coding. Students are expected to use this information to complete the project.

Typology: Study Guides, Projects, Research

Pre 2010

Uploaded on 07/30/2009

koofers-user-i8y-1
koofers-user-i8y-1 🇺🇸

10 documents

1 / 56

Toggle sidebar

Related documents


Partial preview of the text

Download Huffman Codes: Data Structures and Algorithms Project 2 - Prof. David J. Galles and more Study Guides, Projects, Research Data Structures and Algorithms in PDF only on Docsity! Data Structures and Algorithms CS245-2009S-P2 Huffman Codes Project 2 David Galles Department of Computer Science University of San Francisco P2-0: Text Files All files are represented as binary digits – including text files Each character is represented by an integer code ASCII – American Standard Code for Information Interchange Text file is a sequence of binary digits which represent the codes for each character. P2-3: ASCII ASCII is not terribly efficient All characters require 8 bits Frequently used characters require the same number of bits as infrequently used characters We could be more efficient if frequently used characters required fewer than 8 bits, and less frequently used characters required more bits P2-4: Representing Codes as Trees Want to encode 4 only characters: a, b, c, d (instead of 256 characters) How many bits are required for each code, if each code has the same length? P2-5: Representing Codes as Trees Want to encode 4 only characters: a, b, c, d (instead of 256 characters) How many bits are required for each code, if each code has the same length? 2 bits are required, since there are 4 possible options to distinguish P2-8: Representing Codes as Trees a: 01, b: 00, c: 11, d:10 ab cd 0 1 0 01 1 P2-9: Prefix Codes If no code is a prefix of any other code, then decoding the file is unambiguous. If all codes are the same length, then no code will be a prefix of any other code (trivially) We can create variable length codes, where no code is a prefix of any other code P2-10: Variable Length Codes Variable length code example: a: 0, b: 100, c: 101, d: 11 Decoding examples: 100 10011 01101010010011 P2-13: File Length If we use the code: a:00, b:01, c:10, d:11 How many bits are required to encode a file of 20 characters? 20 characters * 2 bits/character = 40 bits P2-14: File Length If we use the code: a:0, b:100, c:101, d:11 How many bits are required to encode a file of 20 characters? P2-15: File Length If we use the code: a:0, b:100, c:101, d:11 How many bits are required to encode a file of 20 characters? It depends upon the number of a’s, b’s, c’s and d’s in the file P2-18: Decoding Files We can use variable length keys to encode a text file Given the encoded file, and the tree representation of the codes, it is easy to decode the file c d 0 1 0 0 1 1a b 0111001010011 P2-19: Decoding Files We can use variable length keys to encode a text file Given the encoded file, and the tree representation of the codes, it is easy to decode the file Finding the kth character in the file is more tricky P2-20: Decoding Files We can use variable length keys to encode a text file Given the encoded file, and the tree representation of the codes, it is easy to decode the file Finding the kth character in the file is more tricky Need to decode the first (k-1) characters in the file, to determine where the kth character is in the file P2-23: Huffman Coding For each code tree, we keep track of the total number of times the characters in that tree appear in the input file We start with one code tree for each character that appears in the input file We combine the two trees with the lowest frequency, until all trees have been combined into one tree P2-24: Huffman Coding Example: If the letters a-e have the frequencies: a: 100, b: 20, c:15, d: 30, e: 1 a:100 d:30c:15b:20 e:1 P2-25: Huffman Coding Example: If the letters a-e have the frequencies: a: 100, b: 20, c:15, d: 30, e: 1 a:100 d:30c:15b:20 e:1 :16 P2-28: Huffman Coding Example: If the letters a-e have the frequencies: a: 100, b: 20, c:15, d: 30, e: 1 a:100 d:30 c:15 b:20 e:1 :16 :36 :66 :166 P2-29: Huffman Coding Example: If the letters a-e have the frequencies: a: 10, b: 10, c:10, d: 10, e: 10 a:10 d:10c:10b:10 e:10 P2-30: Huffman Coding Example: If the letters a-e have the frequencies: a: 10, b: 10, c:10, d: 10, e: 10 a:10 d:10c:10b:10 e:10 :20 P2-33: Huffman Coding Example: If the letters a-e have the frequencies: a: 10, b: 10, c:10, d: 10, e: 10 a:10 d:10c:10 b:10 e:10 :20 :20 :30 :30 P2-34: Huffman Trees & Tables Once we have a Huffman tree, decoding a file is straightforward – but encoding a tree requires a bit more information. Given just the tree, finding an encoding can be difficult ... What would we like to have, to help with encoding? P2-35: Encoding Tables a 0 b 100 c 1010 d 11 e 1011 a:100 d:30 c:15 b:20 e:1 :16 :36 :66 :166 P2-38: Huffman Coding To uncompress a file using huffman coding: Read in the Huffman tree from the input file Read the input file bit by bit, traversing the Huffman tree as you go When a leaf is read, write the appropriate file to an output file P2-39: Binary Files public BinaryFile(String filename, char readOrWrite) public boolean EndOfFile() public char readChar() public void writeChar(char c) public boolean readBit() public void writeBit(boolean bit) public void close() P2-40: Binary Files readBit Read a single bit readChar Read a single character (8 bits) P2-43: Binary Files If we write to a binary file: bit, bit, char, bit, int And then read from the file: bit, char, bit, int, bit What will we get out? Garbage! (except for the first bit) P2-44: Printing out Trees To print out Huffman trees: Print out nodes in pre-order traversal Need a way of denoting which nodes are leaves and which nodes are interior nodes (Huffman trees are full – every node has 0 or 2 children) For each interior node, print out a 0 (single bit). For each leaf, print out a 1, followed by 8 bits for the character at the leaf P2-45: Compression? Is it possible that huffman compression would not compress the file? Is it possible that huffman compression could actually make the file larger? How? P2-48: Compression! What to do? Calculate the size of the input file Calculate the size that the compressed file would be If the compressed file is larger than than the input file, don’t compress P2-49: Compression! Given the frequency array, how large is the input file? P2-50: Compression! Given the frequency array, how large is the input file? ∑ c freq(c) ∗ len(c) (# of characters in the input file) * 8 P2-53: Command Line Arguments public static void main(String args[]) The args parameter holds the input parameters java MyProgram arg1 arg2 arg3 args.length = 3 args[0] = “arg1” args[1] = “arg2” args[2] = “arg3” P2-54: Calling Huffman java Huffman (-c|-u) [-v] [-f] infile outfile (-c|-u) stands for either “-c” (for compress), or “-u” (for uncompress) [-v] stands for an optional “-v” flag (for verbose) [-f] stands for an optional “-f” flag (for force compress) infile is the input file outfile is the output file
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved