Context-Based Compression - Introduction to Data Compression

Databases Reference

In-Depth Information

T A B L E 6 . 15

Sequences sorted into

lexicographic order.

/ bi

/ bt het hi

/ bi

2 et hi

/ bi

/ bt h

3 het hi

/ bi

/ bt

4 hi

/ bi

/ bt het

/ bi

/ bt het h

/ bt het hi

/ bi

/ bt het hi

/ bi

t he

t hi

bt he

Example6.4.2:

In our example

⎡

⎤

⎡

⎤

⎣

⎦

⎣

⎦

the original sequence is sequence number 10, so the first letter in the original sequence is

t . To find the letter following t , we look for t in the array L . There are two t 's in

L . Which should we use? The t in F that we are working with is the lower of two t 's, so we

pick the lower of two t 's in L .Thisis L

[

]

. Therefore, the next letter in our reconstructed

sequence is F

h . The reconstructed sequence to this point is th . To find the next letter,

we look for h in the L array. Again there are two h 's. The h at F

[

is the lower of two h 's in

F , so we pick the lower of the two h 's in L . This is the fifth element of L , so the next element

in our decoded sequence is F

[

]

i . The decoded sequence to this point is thi . The process

continues as depicted in Figure 6.1 to generate the original sequence.

[

Why go through all this trouble? After all, we are going from a sequence of length N

to another sequence of length N plus an index value. It appears that we are actually causing

expansion instead of compression. The answer is that the sequence L can be compressed much

more efficiently than the original sequence. Even in our small example, we have runs of like

symbols. This will happen a lot more when N is large. Consider a large sample of text that

has been cyclically shifted and sorted in a list A . Consider all the rows of A beginning with

Introduction to Data Compression

Search WWH ::

Custom Search

Home