Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Huffman Compression: The Efficient Lossless Data Compression Algorithm, Lab Reports of Engineering

An introduction to huffman compression, a lossless data compression algorithm developed by david a. Huffman in 1951. Huffman coding uses a frequency-sorted binary tree to automatically build a dictionary of variable-length codes for common symbols in a source file, replacing lengthy code groups with short code symbols. The document also discusses the design and representation of huffman trees as recursive data structures.

Typology: Lab Reports

Pre 2010

Uploaded on 08/09/2009

koofers-user-ovu
koofers-user-ovu 🇺🇸

10 documents

1 / 10

Toggle sidebar

Related documents


Partial preview of the text

Download Huffman Compression: The Efficient Lossless Data Compression Algorithm and more Lab Reports Engineering in PDF only on Docsity! Laboratory 11 - Huffman Compression Introduction Huffman Coding is an entropy encoding algorithm used for lossless data compression. Huffman Coding automatically builds a dictionary of variable-length codes for common strings of symbols in a source file. Next, this dictionary is used to replace common lengthy code groups with short code symbols. The combined length of the dictionary and the encoded file is often dramatically shorter than the original file. In 1951, David A. Huffman and his MIT information theory classmates were given the choice of a term paper or a final exam. The professor, Robert M. Fano, assigned a term paper on the problem of finding the most efficient binary code. Huffman, unable to prove any codes were the most efficient, was about to give up and start studying for the final when he hit upon the idea of using a frequency-sorted binary tree and quickly proved this method the most efficient. Huffman outdid his professor, who had developed a sub-optimal algorithm with Claude Shannon. Huffman avoided the major flaw of the suboptimal Shannon-Fano coding by building the tree from the bottom up instead of from the top down. Huffman compression replaces variable length source code groups with variable length compression code groups. Common letters like "T" and words like "AND" use fewer bits than unusual letters like "X". Huffman Compression and Huffman Trees are discussed on pages 398-402 and 447-454. A partial implementation of the HuffmanTree class begins on page 451. Building and Representing Trees There are fundamentally two ways to represent trees: 1. As a linked structure of an internal Node class. 2. As a linked structure of trees. Aside from being able to: add, delete, and retrieve data from our tree, we also want to be able to: retrieve a subtree, detach a subtree, and attach a subtree. For Huffman trees, this suggests that we might want to have a HuffmanTree class and an internal HuffmanLeaf class instead of having a Tree class and a Node class. In short, using a recursive definition for Tree may simplify our design and implementation tasks. Essentially, a Tree would consist of two different kinds of node: an internal node and a leaf node. Now notice that an internal node has to keep all of the same information as a Tree object! Consequently, we can replace or specification for our Tree class with our definition for InternalNode! Thus, we use the design we worked out for InternalNode for Tree and dispense with having a separate InternalNose class. Finally, recall that our left and right links must be able to attach to either leaves or trees. One approach to handling this is to define a Leaf as a subclass of Tree which contains an additional data attribute. This works when you want to place all of the data at the leaves of the tree. If, however, you want to locate data internally, then you simply include a data attribute in your Tree class and dispense with having a separate Leaf class. Trees as Recursive Data Structures Any of the pointer based tree representations involves a recursive data structure: 1. You can implement a BinarySearchTree class by having a BinarySearchTree class and a separate internal Node class. In this case, the Node class is a recursive data structure as each Node contains a pair of left and right attribute variables each of which is of class Node. 2. You can implement a BinarySearchTree class by having a BinarySearchTree class and a separate internal Leaf class which has a data attribute variable, but lacks left and right attribute variables. In this case, BinarySearchTree is a recursive data structure as each left and right attribute variable will belong to class BinarySearchTree. 3. Finally, you can simply define a BinarySearchTree class where each BinarySearchTree object will have an Object valued data attribute as well as left and right branches to subtrees. These BinarySearchTree attribute variables make this realization of BinarySearchTree into a recursive data structure. The Java Collection Framework We have been exploring a number of ADTs (Abstract Data Types) which are commonly used in designing software systems. You might wonder why Tree is not part of the Java Collections Framework. The reason for this is that we use trees as components to construct more intuitive ADTs such as dictionaries and sets which are in the Java Collections Framework. The Map interface effectively replaces the old Dictionary class. Both of these data structures allow you to look up a data element by its key value. The Set interface defines set membership operations. Both of these interfaces have sorted extensions which add order properties to each of them. A SortedSet allows you to sequentially access the members of the set. A SortedMap allows you to sequentially access the keys of the map. Here is a simplified inheritance diagram for the Java Collections Framework: Object get(Object key) Returns the value to which this map maps the specified key. int hashCode() Returns the hash code value for this map. boolean isEmpty() Returns true if this map contains no key-value mappings. Set keySet() Returns a set view of the keys contained in this map. Object put(Object key, Object value) Associates the specified value with the specified key in this map (optional operation). void putAll(Map t) Copies all of the mappings from the specified map to this map (optional operation). Object remove(Object key) Removes the mapping for this key from this map if it is present (optional operation). int size() Returns the number of key-value mappings in this map. Collection values() Returns a collection view of the values contained in this map. Set Interface A Collection that contains no duplicate elements. More formally, sets contain no pair of elements e1 and e2 such that e1.equals(e2), and at most one null element. As implied by its name, this interface models mathematical sets. Method Summary boolean add(Object o) Adds the specified element to this set if it is not already present (optional operation). boolean addAll(Collection c) Adds all of the elements in the specified collection to this set if they're not already present (optional operation). void clear() Removes all of the elements from this set (optional operation). boolean contains(Object o) Returns true if this set contains the specified element. boolean containsAll(Collection c) Returns true if this set contains all of the elements of the specified collection. boolean equals(Object o) Compares the specified object with this set for equality. int hashCode() Returns the hash code value for this set. boolean isEmpty() Returns true if this set contains no elements. Iterator iterator() Returns an iterator over the elements in this set. boolean remove(Object o) Removes the specified element from this set if it is present (optional operation). boolean removeAll(Collection c) Removes from this set all of its elements that are contained in the specified collection (optional operation). boolean retainAll(Collection c) Retains only the elements in this set that are contained in the specified collection (optional operation). int size() Returns the number of elements in this set (its cardinality). Object[] toArray() Returns an array containing all of the elements in this set. Object[] toArray(Object[] a) Returns an array containing all of the elements in this set; the runtime type of the returned array is that of the specified array. Getting Started Create a new directory for the Software Library, and download the file archive into it. After you expand the archive, you will need to use the HuffmanTree files in the CH08 directory and the BitString files from the CH09 directory. You should also refer to the discussion on how to build a custom HuffmanTree class begining on page 448 of our textbook. Software Engineering Complete all methods of class HuffmanTree and test them out using a document file and a Java source file on your computer. For testing purposes, you can use your source file for this project and any computer-readable multi-page text document. To complete this assignment you need to both compress a file and decompress the compressed version of the file. The resulting decompressed version should be identical to the original version of the file. In addition, your program should report the lengths of both files and the compression ratio. For full credit, be sure to implement a generic HuffmanTree class. This will allow your Huffman tree to store strings or groups of pixels in an image file. A generic HuffmanTree<T> requires a generic inner class HuffData<T> where the T is the data type of symbol. Each parameter type <HUffData> in our class HuffmanTree would be replaced by <HuffData<T>> which indicates that T is a type parameter for class HuffData. For extra credit, produce a String based version of your Huffman compression program and compare compression efficiency between the character oriented version and the string oriented version. Be sure to turn in both versions of your program. Project Summary You will be using a variety of data structures to implement this project. Effectively, your Huffman tree will implement a Dictionary. You should expect your Huffman tree to be fairly flat like the one in figure 8.35 of our textbook. 1. Your transmitter will scan the input file to derive character frequency statistics. Collect these statistics in an array of integers where the index value corresponds to the character code and the value counts the number of occurances of that character. 2. Use the character frequency data which you just collected to construct a Huffman tree. Since this step builds a Huffman tree, you should think of doing this inside your Huffman tree constructor. In this case, you can work directly with nodes, and dispense with worrying about trees until you have finished constructing your Huffman tree. Your Huffman tree constructor should accept your character frequency table as an argument. We will talk about adding and removing nodes from the priority queue instead of trees as in the textbook. Each node should have the following structure: o A character value. Equal to zero for interior nodes. o A weight, which for leaf nodes is set equal to the number of times that character appeared in the file. o A left branch. Set equal to null for leaves. o A right branch. Set equal to null for leaves. 3. Itteratively construct a leaf node for each character with non-zero frequency, and insert each of these nodes into your priority queue. 4. Itteratively remove pairs of nodes from the priority queue until you are left with a single node which becomes the root of your Huffman tree. 5. Each time you remove a pair of nodes from the priority queue, create a new node which is linked to the two nodes you removed from your priority queue. Set the weight of this new node equal to the sum of the weights of the two nodes that it is linked to. Then, put this new node into the priority queue.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved