Binary Trees - Data Structures and Algorithm Analysis

Java Reference

In-Depth Information

5.6

Human Coding Trees

The space/time tradeoff principle from Section 3.9 states that one can often gain

an improvement in space requirements in exchange for a penalty in running time.

There are many situations where this is a desirable tradeoff. A typical example is

storing files on disk. If the files are not actively used, the owner might wish to

compress them to save space. Later, they can be uncompressed for use, which costs

some time, but only once.

We often represent a set of items in a computer program by assigning a unique

code to each item. For example, the standard ASCII coding scheme assigns a

unique eight-bit value to each character. It takes a certain minimum number of

bits to provide unique codes for each character. For example, it takes dlog 128e or

seven bits to provide the 128 unique codes needed to represent the 128 symbols of

the ASCII character set. 5

The requirement fordlog nebits to represent n unique code values assumes that

all codes will be the same length, as are ASCII codes. This is called a fixed-length

coding scheme. If all characters were used equally often, then a fixed-length coding

scheme is the most space efficient method. However, you are probably aware that

not all characters are used equally often in many applications. For example, the

various letters in an English language document have greatly different frequencies

of use.

Figure 5.23 shows the relative frequencies of the letters of the alphabet. From

this table we can see that the letter 'E' appears about 60 times more often than the

letter 'Z.' In normal ASCII, the words “DEED” and “MUCK” require the same

amount of space (four bytes). It would seem that words such as “DEED,” which

are composed of relatively common letters, should be storable in less space than

words such as “MUCK,” which are composed of relatively uncommon letters.

If some characters are used more frequently than others, is it possible to take

advantage of this fact and somehow assign them shorter codes? The price could

be that other characters require longer codes, but this might be worthwhile if such

characters appear rarely enough. This concept is at the heart of file compression

techniques in common use today. The next section presents one such approach to

assigning variable-length codes, called Huffman coding. While it is not commonly

used in its simplest form for file compression (there are better methods), Huffman

coding gives the flavor of such coding schemes. One motivation for studying Huff-

man coding is because it provides our first opportunity to see a type of tree structure

referred to as a search trie.

5 The ASCII standard is eight bits, not seven, even though there are only 128 characters repre-

sented. The eighth bit is used either to check for transmission errors, or to support “extended” ASCII

codes with an additional 128 characters.

Search WWH ::

Custom Search

Home