Context-Based Compression - Introduction to Data Compression

Databases Reference

In-Depth Information

1

A

0

B

0

1

0

A'

1

F I GU R E 6 . 3

A three-state model obtained by cloning.

the algorithm. In fact, we have been rather vague about a number of implementation issues.

We will attempt to rectify the situation.

There are a number of issues that need to be addressed in order to implement this algorithm:

1. What is the initial number of states?

2. How do we estimate probabilities?

3. How do we decide when a state needs to be cloned?

4. What do we do when the number of states becomes too large?

Let's answer each question in turn.

We can start the encoding process with a single state with two self-loops for 0 and 1. This

state can be cloned to two and then a higher number of states. In practice, it has been found

that, depending on the particular application, it is more efficient to start with a larger number

of states than one.

The probabilities from a given state can be estimated by simply counting the number of

times a 0 or a 1 occurs in that state divided by the number of times the particular state is

occupied. For example, if in state V the number of times a 0 occurs is denoted by n 0

and the

number of times a 1 occurs is denoted by n 1 , then

n 0

P

(

0

|

V

) =

n 0

n 1

+

n 1

P

(

1

|

V

) =

n 0

n 1

+

What if a 1 has never previously occurred in this state? This approach would assign a

probability of zero to the occurrence of a 1. This means that there will be no subinterval

assigned to the possibility of a 1 occurring, and when it does occur, we will not be able to

Introduction to Data Compression

Search WWH ::

Custom Search

Home