Huffman Coding - Introduction to Data Compression

Databases Reference

In-Depth Information

to a . The next bit is a 0, which traces a path from the root to the NYT node. The next 4 bits,

1000, correspond to the decimal number 8, which is less than 10, so we read in one more bit

to get the 5-bit word 10001. The decimal equivalent of this 5-bit word plus one is 18, which

is the index for r . We decode the symbol r and then update the tree. The next 2 bits, 00, again

trace a path to the NYT node. We read the next 4 bits, 0001. Since this corresponds to the

decimal number 1, which is less than 10, we read another bit to get the 5-bit word 00011. To

get the index of the received symbol in the NYT list, we add one to the decimal value of this

5-bit word. The value of the index is 4, which corresponds to the symbol d . Continuing in this

fashion, we decode the sequence aard

v

a .

Although the Huffman coding algorithm is one of the best-known variable-length coding

algorithms, there are some other lesser-known algorithms that can be very useful in certain situ-

ations. In particular, the Golomb-Rice codes and the Tunstall codes are becoming increasingly

popular. We describe these codes in the following sections.

3.5 Golomb Codes

The Golomb-Rice codes belong to a family of codes designed to encode integers with the

assumption that the larger an integer, the lower its probability of occurrence. The simplest

code for this situation is the unary code. The unary code for a positive integer n is simply

n 1s followed by a 0. Thus, the code for 4 is 11110, and the code for 7 is 11111110. The

unary code is the same as the Huffman code for the semi-infinite alphabet

{

1

,

2

,

3

,... }

with

probability model

1

2 k

Because the Huffman code is optimal, the unary code is also optimal for this probability model.

Although the unary code is optimal in very restricted conditions, we can see that it is

certainly very simple to implement. One step higher in complexity are a number of coding

schemes that split the integer into two parts, representing one part with a unary code and the

other part with a different code. An example of such a code is the Golomb code. Other

examples can be found in [ 26 ].

The Golomb code is described in a succinct paper [ 27 ] by Solomon Golomb, which begins

“Secret Agent 00111 is back at the Casino again, playing a game of chance, while the fate

of mankind hangs in the balance.” Agent 00111 requires a code to represent runs of success

in a roulette game, and Golomb provides it! The Golomb code is actually a family of codes

parameterized by an integer m

[

]=

P

k

>

0. In the Golomb code with parameter m , we represent an

>

integer n

0 using two numbers q and r , where

n

m

q

=

and

r

=

n

−

qm

Introduction to Data Compression

Search WWH ::

Custom Search

Home