Databases Reference
In-Depth Information
T A B L E 6 . 15
Sequences sorted into
lexicographic order.
0
/
bi
s
/
bt het hi
s
1
/
bt het hi
s
/
bi
s
2
et hi
s
/
bi
s
/
bt h
3
het hi
s
/
bi
s
/
bt
4
hi
s
/
bi
s
/
bt het
/
bi
/
bt het h
5
i
s
s
/
bt het hi
b
6
i
s
s
7
s
/
bi
s
/
bt het hi
8
s
/
bt het hi
s
/
bi
9
t he
t hi
s
/
bi
s
/
b
10
t hi
s
/
bi
s
/
bt he
Example6.4.2:
In our example
⎡
⎤
⎡
⎤
/
b
/
s
s
h
t
t
h
/
⎣
⎦
⎣
⎦
b
e
h
h
i
i
s
s
t
t
F
=
L
=
b
i
i
/
b
e
the original sequence is sequence number 10, so the first letter in the original sequence is
F
t
. To find the letter following
t
, we look for
t
in the array
L
. There are two
t
's in
L
. Which should we use? The
t
in
F
that we are working with is the lower of two
t
's, so we
pick the lower of two
t
's in
L
.Thisis
L
[
10
]=
[
4
]
. Therefore, the next letter in our reconstructed
sequence is
F
h
. The reconstructed sequence to this point is
th
. To find the next letter,
we look for
h
in the
L
array. Again there are two
h
's. The
h
at
F
[
4
]=
is the lower of two
h
's in
F
, so we pick the lower of the two
h
's in
L
. This is the fifth element of
L
, so the next element
in our decoded sequence is
F
[
4
]
i
. The decoded sequence to this point is
thi
. The process
continues as depicted in Figure
6.1
to generate the original sequence.
[
5
]=
Why go through all this trouble? After all, we are going from a sequence of length
N
to another sequence of length
N
plus an index value. It appears that we are actually causing
expansion instead of compression. The answer is that the sequence
L
can be compressed much
more efficiently than the original sequence. Even in our small example, we have runs of like
symbols. This will happen a lot more when
N
is large. Consider a large sample of text that
has been cyclically shifted and sorted in a list
A
. Consider all the rows of
A
beginning with