Digital Signal Processing Reference
In-Depth Information
11.2 Huffman Coding
Since the entropy is less than two bits, then, in theory, we
should be able to convey this information in less than two bits.
What if we encoded these events differently as below:
Outcome 1: Probability
¼
0.5, encode as 0.
Outcome 2: Probability
¼
0.25, encode as 10.
Outcome 3: Probability
¼
0.125, encode as 110.
0.125, encode as 111.
For half the time we would present the data with one bit, one
quarter of the time with two bits and one quarter of the time with
three bits.
Average number of bits
Outcome 4: Probability
¼
¼
0.5
1
þ
0.25
2
þ
0.125
3
þ
0. 125
¼
1.75
The nice thing about this encoding is that the bits can be put
together into a continuous bit stream, and unambiguously
decoded.
010110111110100
3
.
The bit stream above can only represent one possible outcome
sequence.
outcome 1 ,
.
outcome 2 ,
outcome 3 ,
outcome 4 ,
outcome 3 ,
outcome 2 , outcome 1 .
An algorithm known as Huffman coding works in this manner,
by assigning the shortest codewords (the bit-word that each
outcome is mapped to, or encoded) to the events of highest
probability. For example, Huffman codes are used in JPEG-based
image compression. Originally, this concept was used over 150
years ago in Morse code for telegraphs, where each letter of the
alphabet is mapped to a series of short dots and long dashes.
Common letters:
E
$
I
$$
T-
Uncommon letters:
X-
$$
-
$
Y-
-
$$
In Morse code, the higher probability letters are encoded in
a few dots, or a single dash. The lower probability letters use
several dashes and dots. This minimized, on average, the
amount of time required by the operator to send a telegraph
message.
Z-
Search WWH ::




Custom Search