Databases Reference
In-Depth Information
4.4.2 Algorithm Implementation
In Section 4.3.1 , we developed a recursive algorithm for the boundaries of the interval con-
taining the tag for the sequence being encoded as
l ( n ) =
l ( n 1 ) + (
u ( n 1 )
l ( n 1 ) )
F X (
x n
1
)
(22)
u ( n ) =
l ( n 1 ) + (
u ( n 1 )
l ( n 1 ) )
F X (
x n )
(23)
where x n is the value of the random variable corresponding to the n th observed symbol, l ( n )
is the lower limit of the tag interval at the n th iteration, and u ( n )
is the upper limit of the tag
interval at the n th iteration.
Before we can implement this algorithm, there is one major problem we have to resolve.
Recall that the rationale for using numbers in the interval [0, 1) as a tag was that there are an
infinite number of numbers in this interval. However, in practice, the number of numbers that
can be uniquely represented on a machine is limited by the maximum number of digits (or
bits) we can use for representing the number. Consider the values of l ( n ) and u ( n ) in Example
4.3.5 .As n gets larger, these values come closer and closer together. This means that in order
to represent all of the subintervals uniquely, we need increasing precision as the length of the
sequence increases. In a system with finite precision, the two values are bound to converge,
and we will lose all information about the sequence from the point at which the two values
converged. To avoid this situation, we need to rescale the interval. However, we have to
do it in a way that will preserve the information that is being transmitted. We would also
like to perform the encoding incrementally —that is, to transmit portions of the code as the
sequence is being observed, rather than wait until the entire sequence has been observed before
transmitting the first bit. The algorithm we describe in this section takes care of the problems
of synchronized rescaling and incremental encoding.
As the interval becomes narrower, we have three possibilities:
1. The interval is entirely confined to the lower half of the unit interval [0, 0.5).
2. The interval is entirely confined to the upper half of the unit interval [0.5, 1.0).
3. The interval straddles the midpoint of the unit interval.
We will look at the third case a little later in this section. First, let us examine the first two
cases. Once the interval is confined to either the upper or lower half of the unit interval, it
is forever confined to that half of the unit interval. The most significant bit of the binary
representation of all numbers in the interval [0, 0.5) is 0, and the most significant bit of the
binary representation of all numbers in the interval [0.5, 1] is 1. Therefore, once the interval
gets restricted to either the upper or lower half of the unit interval, the most significant bit of
the tag is fully determined. Thus, without waiting to see what the rest of the sequence looks
like, we can indicate to the decoder whether the tag is confined to the upper or lower half of
the unit interval by sending a 1 for the upper half and a 0 for the lower half. The bit that we
send is also the first bit of the tag.
Once the encoder and decoder know which half contains the tag, we can ignore the half of
the unit interval not containing the tag and concentrate on the half containing the tag. As our
arithmetic is of finite precision, we can do this best by mapping the half interval containing
Search WWH ::




Custom Search