Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Huffman Trees - Lecture Notes - Data Structure and Algorithms | CS 245, Study notes of Data Structures and Algorithms

Material Type: Notes; Class: Data Struct & Algorithms; Subject: Computer Science; University: University of San Francisco (CA); Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-jzv-1
koofers-user-jzv-1 🇺🇸

10 documents

1 / 48

Toggle sidebar

Related documents


Partial preview of the text

Download Huffman Trees - Lecture Notes - Data Structure and Algorithms | CS 245 and more Study notes Data Structures and Algorithms in PDF only on Docsity! Data Structures and Algorithms Huffman Trees Chris Brooks Department of Computer Science University of San Francisco Department of Computer Science — University of San Francisco – p.1/23 10-0: Text Files All files are represented as binary digits – including text files Each character is represented by an integer code ASCII – American Standard Code for Information Interchange Text file is a sequence of binary digits which represent the codes for each character. Department of Computer Science — University of San Francisco – p.2/23 10-3: ASCII ASCII is not terribly efficient All characters require 8 bits Frequently used characters require the same number of bits as infrequently used characters We could be more efficient if frequently used characters required fewer than 8 bits, and less frequently used characters required more bits Department of Computer Science — University of San Francisco – p.5/23 10-4: Representing Codes as Trees Want to encode 4 only characters: a, b, c, d (instead of 256 characters) How many bits are required for each code, if each code has the same length? Department of Computer Science — University of San Francisco – p.6/23 10-5: Representing Codes as Trees Want to encode 4 only characters: a, b, c, d (instead of 256 characters) How many bits are required for each code, if each code has the same length? 2 bits are required, since there are 4 possible options to distinguish Department of Computer Science — University of San Francisco – p.7/23 10-8: Representing Codes as Trees a: 01, b: 00, c: 11, d:10 ab cd 0 1 0 01 1 Department of Computer Science — University of San Francisco – p.10/23 10-9: Prefix Codes If no code is a prefix of any other code, then decoding the file is unambiguous. How do you know whether a string is one complete code, or part of another? If all codes are the same length, then no code will be a prefix of any other code (trivially) We can create variable length codes, where no code is a prefix of any other code Department of Computer Science — University of San Francisco – p.11/23 10-10: Variable Length Codes Variable length code example: a: 0, b: 100, c: 101, d: 11 Decoding examples: 100 10011 01101010010011 Department of Computer Science — University of San Francisco – p.12/23 10-13: File Length If we use the code: a:00, b:01, c:10, d:11 How many bits are required to encode a file of 20 characters? 20 characters * 2 bits/character = 40 bits Department of Computer Science — University of San Francisco – p.15/23 10-14: File Length If we use the code: a:0, b:100, c:101, d:11 How many bits are required to encode a file of 20 characters? Department of Computer Science — University of San Francisco – p.16/23 10-15: File Length If we use the code: a:0, b:100, c:101, d:11 How many bits are required to encode a file of 20 characters? It depends upon the number of a’s, b’s, c’s and d’s in the file Department of Computer Science — University of San Francisco – p.17/23 10-18: Decoding Files We can use variable length keys to encode a text file Given the encoded file, and the tree representation of the codes, it is easy to decode the file c d 0 1 0 0 1 1a b 0111001010011 Department of Computer Science — University of San Francisco – p.20/23 10-19: Decoding Files We can use variable length keys to encode a text file Given the encoded file, and the tree representation of the codes, it is easy to decode the file Finding the kth character in the file is more tricky Department of Computer Science — University of San Francisco – p.21/23 10-20: Decoding Files We can use variable length keys to encode a text file Given the encoded file, and the tree representation of the codes, it is easy to decode the file Finding the kth character in the file is more tricky Need to decode the first (k-1) characters in the file, to determine where the kth character is in the file Gain space, lose random access. Department of Computer Science — University of San Francisco – p.22/23 10-23: Huffman Coding For each code tree, we keep track of the total number of times the characters in that tree appear in the input file We start with one code tree for each character that appears in the input file We combine the two trees with the lowest frequency, until all trees have been combined into one tree Department of Computer Science — University of San Francisco – p.25/23 10-24: Huffman Coding Example: If the letters a-e have the frequencies: a: 100, b: 20, c:15, d: 30, e: 1 a:100 d:30c:15b:20 e:1 Department of Computer Science — University of San Francisco – p.26/23 10-25: Huffman Coding Example: If the letters a-e have the frequencies: a: 100, b: 20, c:15, d: 30, e: 1 a:100 d:30c:15b:20 e:1 :16 Department of Computer Science — University of San Francisco – p.27/23 10-28: Huffman Coding Example: If the letters a-e have the frequencies: a: 100, b: 20, c:15, d: 30, e: 1 a:100 d:30 c:15 b:20 e:1 :16 :36 :66 :166 Department of Computer Science — University of San Francisco – p.30/23 10-29: Huffman Coding Example: If the letters a-e have the frequencies: a: 10, b: 10, c:10, d: 10, e: 10 a:10 d:10c:10b:10 e:10 Department of Computer Science — University of San Francisco – p.31/23 10-30: Huffman Coding Example: If the letters a-e have the frequencies: a: 10, b: 10, c:10, d: 10, e: 10 a:10 d:10c:10b:10 e:10 :20 Department of Computer Science — University of San Francisco – p.32/23 10-33: Huffman Coding Example: If the letters a-e have the frequencies: a: 10, b: 10, c:10, d: 10, e: 10 a:10 d:10c:10 b:10 e:10 :20 :20 :30 :30 Department of Computer Science — University of San Francisco – p.35/23 10-34: Huffman Trees & Tables Once we have a Huffman tree, decoding a file is straightforward – but encoding a tree requires a bit more information. Given just the tree, finding an encoding can be difficult ... What would we like to have, to help with encoding? Department of Computer Science — University of San Francisco – p.36/23 10-35: Encoding Tables a 0 b 100 c 1010 d 11 e 1011 a:100 d:30 c:15 b:20 e:1 :16 :36 :66 :166 Department of Computer Science — University of San Francisco – p.37/23 10-38: Huffman Coding To uncompress a file using huffman coding: Read in the Huffman tree from the input file Read the input file bit by bit, traversing the Huffman tree as you go When a leaf is read, write the appropriate file to an output file Department of Computer Science — University of San Francisco – p.40/23 10-39: Binary Files public BinaryFile(String filename, char readOrWrite) public boolean EndOfFile() public char readChar() public void writeChar(char c) public int readInt() public void writeInt(int i) public boolean readBit() public void writeBit(boolean bit) public void close() Department of Computer Science — University of San Francisco – p.41/23 10-40: Binary Files readBit Read a single bit readChar Read a single character (8 bits) readInt Read a single int (9 bits) in the range -255 . . . 255 Department of Computer Science — University of San Francisco – p.42/23 10-43: Binary Files If we write to a binary file: bit, bit, char, bit, int And then read from the file: bit, char, bit, int, bit What will we get out? Garbage! (except for the first bit) Department of Computer Science — University of San Francisco – p.45/23 10-44: Printing out Trees To print out Huffman trees: Print out nodes in pre-order traversal Need a way of denoting which nodes are leaves and which nodes are interior nodes (Huffman trees are full – every node has 0 or 2 children) Print out 9 bits for each node – positive values for leaves, negative values for interior nodes Value printed for interior nodes doesn’t matter, as long as it is negative Department of Computer Science — University of San Francisco – p.46/23 10-45: Command Line Arguments public static void main(String args[]) The args parameter holds the input parameters java MyProgram arg1 arg2 arg3 args.length() = 3 args[0] = “arg1” args[1] = “arg2” args[2] = “arg3” Department of Computer Science — University of San Francisco – p.47/23
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved