Download Huffman Codes: Efficient Data Compression Using Frequency-Based Encoding and more Study notes Data Structures and Algorithms in PDF only on Docsity! Pupil Text 17 MEP: Codes and Ciphers, UNIT 17 Huffman Codes Huffman Codes The most common way to represent letters and numbers in computing is by using the ASCII code, American Standard Code for Information Interchange This is based on a string of seven 'bits' where each bit can be either 'I' or '0'. For example, Character ASCII Code A 100 0001 B 100 0010 Cc 100 0011 D 100 0100 011 0001 2 011 0010 3 011 0011 The system can also be used for punctuation marks and symbols; the full list is given in the Appendix. Exercise 1 Using a seven bit construction, how many characters can be coded? This system is not particularly efficient for situations where, as with calculators, memory is limited, or when we are trying to condense data (as in the many compression programs that are vitally important for internet transmission of files). However, when transmitting, for example, a message, it is more efficient to use a code which has shorter lengths (i.e. number of bits used) for the most frequently used letters and longer lengths for the least used letters. This is the basis for the Huffman coding system, which was developed by David A Huffman as PhD student at Massachusetts Institute of Technology in 1952. We illustrate the method with an example that uses just five letters. Example 1 Construct a Huffman code for the five letters EAMNT which are listed in decreasing order of frequency of use. Pupil Text ES MEP: Codes and Ciphers, UNIT 17 Huffman Codes Solution You need to allocate the shortest codes to the letters with the highest frequency, so E is allocated to the code 1, A to 00, etc. This is best illustrated using a tree diagram. Start T N Branching to the left is coded '1' and to the right, '0'. This gives the code Letter Code Length E 1 1 A 00 2 M 010 3 N 0110 4 T Olll 4 and is particularly suitable if the frequencies of N and T are about the same. Example 2 Use the diagram or list to decode 0111000101010000110 Solution Using the tree diagram, follow the code until a letter is reached and then start again. 011 110 010 1 ol1!o 1 Olo Olo 110 rt |alM fel mw fal on