Java Reference
In-Depth Information
Letter Frequency Letter Frequency
A 77 N 67
B 17 O 67
C 32 P 20
D 42 Q 5
E 120 R 59
F 24 S 67
G 17 T 85
H 50 U 37
I 76 V 12
J 4 W 22
K 7 X 4
L 42 Y 22
M 24 Z 2
Figure5.23 Relative frequencies for the 26 letters of the alphabet as they ap-
pear in a selected set of English documents. “Frequency” represents the expected
frequency of occurrence per 1000 letters, ignoring case.
5.6.1
Building Human Coding Trees
Huffman coding assigns codes to characters such that the length of the code de-
pends on the relative frequency or weight of the corresponding character. Thus, it
is a variable-length code. If the estimated frequencies for letters match the actual
frequency found in an encoded message, then the length of that message will typi-
cally be less than if a fixed-length code had been used. The Huffman code for each
letter is derived from a full binary tree called the Huffman coding tree, or simply
the Huffman tree. Each leaf of the Huffman tree corresponds to a letter, and we
define the weight of the leaf node to be the weight (frequency) of its associated
letter. The goal is to build a tree with the minimum external path weight. Define
the weighted path length of a leaf to be its weight times its depth. The binary tree
with minimum external path weight is the one with the minimum sum of weighted
path lengths for the given set of leaves. A letter with high weight should have low
depth, so that it will count the least against the total path length. As a result, another
letter might be pushed deeper in the tree if it has less weight.
The process of building the Huffman tree for n letters is quite simple. First, cre-
ate a collection of n initial Huffman trees, each of which is a single leaf node con-
taining one of the letters. Put the n partial trees onto a priority queue organized by
weight (frequency). Next, remove the first two trees (the ones with lowest weight)
from the priority queue. Join these two trees together to create a new tree whose
root has the two trees as children, and whose weight is the sum of the weights of the
two trees. Put this new tree back into the priority queue. This process is repeated
until all of the partial Huffman trees have been combined into one.
 
Search WWH ::




Custom Search