Databases Reference
In-Depth Information
T A B L E 5 . 4
The initial dictionary.
Index
Entry
1
w
2
a
3
b
Although this is an extreme situation, there are less drastic circumstances in which the finite
view of the past would be a drawback. The LZ78 algorithm solves this problem by dropping
the reliance on the search buffer and keeping an explicit dictionary. This dictionary has to be
built at both the encoder and decoder, and care must be taken that the dictionaries are built in an
identical manner. The inputs are coded as a double
, with i being an index corresponding
to the dictionary entry that was the longest match to the input and c being the code for the
character in the input following the matched portion of the input. As in the case of LZ77, the
index value of 0 is used in the case of no match. This double then becomes the newest entry in
the dictionary. Thus, each new entry into the dictionary is one new symbol concatenated with
an existing dictionary entry. To see how the LZ78 algorithm works, consider the following
example.
i
,
c
Examp l e 5 . 4 . 2 : The LZ78 Approach
Let us encode the following sequence using the LZ78 approach: 1
w
abba
b
/
w
abba
/
b
w
abba
/
b
w
abba
b
/
w
oo
/
b
w
oo
/
b
w
oo
.
where
b stands for space. Initially, the dictionary is empty, so the first few symbols encountered
are encoded with the index value set to 0. The first three encoder outputs are
/
0
,
C
(w)
,
, and the dictionary looks like Table 5.4 .
The fourth symbol is a b , which is the third entry in the dictionary. If we append the next
symbol, we would get the pattern ba , which is not in the dictionary, so we encode these two
symbols as
0
,
C
(
a
)
, and
0
,
C
(
b
)
and add the pattern ba as the fourth entry in the dictionary. Continuing
in this fashion, the encoder output and the dictionary develop as in Table 5.5 . Notice that
the entries in the dictionary generally keep getting longer, and if this particular sentence was
repeated often, as it is in the song, after a while the entire sentence would be an entry in the
dictionary.
3
,
C
(
a
)
While the LZ78 algorithm has the ability to capture patterns and hold them indefinitely, it
also has a rather serious drawback. As seen from the example, the dictionary keeps growing
without bound. In a practical situation, we would have to stop the growth of the dictionary at
some stage and then either prune it back or treat the encoding as a fixed dictionary scheme.
We will discuss some possible approaches when we study applications of dictionary coding.
1 “The Monster Song” from Sesame Street .
 
Search WWH ::




Custom Search