Databases Reference
In-Depth Information
T A B L E 6 . 15
Sequences sorted into
lexicographic order.
0
/ bi
s
/ bt het hi
s
1
/ bt het hi
s
/ bi
s
2 et hi
s
/ bi
s
/ bt h
3 het hi
s
/ bi
s
/ bt
4 hi
s
/ bi
s
/ bt het
/ bi
/ bt het h
5
i
s
s
/ bt het hi
b
6
i
s
s
7
s
/ bi
s
/ bt het hi
8
s
/ bt het hi
s
/ bi
9
t he
t hi
s
/
bi
s
/
b
10
t hi
s
/
bi
s
/
bt he
Example6.4.2:
In our example
/
b
/
s
s
h
t
t
h
/
b
e
h
h
i
i
s
s
t
t
F
=
L
=
b
i
i
/
b
e
the original sequence is sequence number 10, so the first letter in the original sequence is
F
t . To find the letter following t , we look for t in the array L . There are two t 's in
L . Which should we use? The t in F that we are working with is the lower of two t 's, so we
pick the lower of two t 's in L .Thisis L
[
10
]=
[
4
]
. Therefore, the next letter in our reconstructed
sequence is F
h . The reconstructed sequence to this point is th . To find the next letter,
we look for h in the L array. Again there are two h 's. The h at F
[
4
]=
is the lower of two h 's in
F , so we pick the lower of the two h 's in L . This is the fifth element of L , so the next element
in our decoded sequence is F
[
4
]
i . The decoded sequence to this point is thi . The process
continues as depicted in Figure 6.1 to generate the original sequence.
[
5
]=
Why go through all this trouble? After all, we are going from a sequence of length N
to another sequence of length N plus an index value. It appears that we are actually causing
expansion instead of compression. The answer is that the sequence L can be compressed much
more efficiently than the original sequence. Even in our small example, we have runs of like
symbols. This will happen a lot more when N is large. Consider a large sample of text that
has been cyclically shifted and sorted in a list A . Consider all the rows of A beginning with
 
Search WWH ::




Custom Search