Java Reference
In-Depth Information
5.6
Human Coding Trees
The space/time tradeoff principle from Section 3.9 states that one can often gain
an improvement in space requirements in exchange for a penalty in running time.
There are many situations where this is a desirable tradeoff. A typical example is
storing files on disk. If the files are not actively used, the owner might wish to
compress them to save space. Later, they can be uncompressed for use, which costs
some time, but only once.
We often represent a set of items in a computer program by assigning a unique
code to each item. For example, the standard ASCII coding scheme assigns a
unique eight-bit value to each character. It takes a certain minimum number of
bits to provide unique codes for each character. For example, it takes dlog 128e or
seven bits to provide the 128 unique codes needed to represent the 128 symbols of
the ASCII character set. 5
The requirement fordlog nebits to represent n unique code values assumes that
all codes will be the same length, as are ASCII codes. This is called a fixed-length
coding scheme. If all characters were used equally often, then a fixed-length coding
scheme is the most space efficient method. However, you are probably aware that
not all characters are used equally often in many applications. For example, the
various letters in an English language document have greatly different frequencies
of use.
Figure 5.23 shows the relative frequencies of the letters of the alphabet. From
this table we can see that the letter 'E' appears about 60 times more often than the
letter 'Z.' In normal ASCII, the words “DEED” and “MUCK” require the same
amount of space (four bytes). It would seem that words such as “DEED,” which
are composed of relatively common letters, should be storable in less space than
words such as “MUCK,” which are composed of relatively uncommon letters.
If some characters are used more frequently than others, is it possible to take
advantage of this fact and somehow assign them shorter codes? The price could
be that other characters require longer codes, but this might be worthwhile if such
characters appear rarely enough. This concept is at the heart of file compression
techniques in common use today. The next section presents one such approach to
assigning variable-length codes, called Huffman coding. While it is not commonly
used in its simplest form for file compression (there are better methods), Huffman
coding gives the flavor of such coding schemes. One motivation for studying Huff-
man coding is because it provides our first opportunity to see a type of tree structure
referred to as a search trie.
5 The ASCII standard is eight bits, not seven, even though there are only 128 characters repre-
sented. The eighth bit is used either to check for transmission errors, or to support “extended” ASCII
codes with an additional 128 characters.
Search WWH ::




Custom Search