Databases Reference
In-Depth Information
Match pointer
r
r
a
a
b
r
a
a
a
d
a
b
a
r
a
r
r
a
Search buffer
Look ahead buffer
F I GU R E 5 . 1
Encoding using the LZ77 approach.
buffer. The distance of the pointer from the lookahead buffer is called the offset . The encoder
then examines the symbols following the symbol at the pointer location to see if they match
consecutive symbols in the lookahead buffer. The number of consecutive symbols in the search
buffer that match consecutive symbols in the lookahead buffer, starting with the first symbol,
is called the length of the match. The encoder searches the search buffer for the longest match.
Once the longest match has been found, the encoder encodes it with a triple
, where o
is the offset, l is the length of the match, and c is the codeword corresponding to the symbol in
the lookahead buffer that follows the match. For example, in Figure 5.1 the pointer is pointing
to the beginning of the longest match. The offset o in this case is 7, the length of the match l
is 4, and the symbol in the lookahead buffer following the match is r .
The reason for sending the third element in the triple is to take care of the situation where
no match for the symbol in the lookahead buffer can be found in the search buffer. In this case,
the offset and match-length values are set to 0, and the third element of the triple is the code
for the symbol itself.
If the size of the search buffer is S , the size of the window (search and lookahead buffers)
is W , and the size of the source alphabet is A , then the number of bits needed to code the triple
using fixed-length codes is
o
,
l
,
c
+
+
log 2 S
log 2 W
log 2 A
. Notice that the second term is
. The reason for this is that the length of the match can actually exceed
the length of the search buffer. We will see how this happens in Example 5.4.1 .
In the following example, we will look at three different possibilities that may be encoun-
tered during the coding process.
log 2 W
, not
log 2 S
1. There is no match for the next character to be encoded in the window.
2. There is a match.
3. The matched string extends inside the lookahead buffer.
Examp l e 5 . 4 . 1 : The LZ77 Approach
Suppose the sequence to be encoded is
...
cabracadabrarrarrad
...
Suppose the length of the window is 13, the size of the lookahead buffer is 6, and the current
condition is
cabraca dabrar
with dabrar in the lookahead buffer. We look back in the already encoded portion of the
window to find a match for d . As we can see, there is no match, so we transmit the triple
Search WWH ::




Custom Search