Database Reference
In-Depth Information
2 N sequences of N bits, but fewer than 2 N representations, there must be two different bit
strings w and x that have the same representation. Since w x , they must differ in at least
one bit. Let the last k − 1 bits of w and x agree, but let them differ on the k th bit from the
right end.
EXAMPLE 4.10 If w = 0101 and x = 1010, then k = 1, since scanning from the right, they
first disagree at position 1. If w = 1001 and x = 0101, then k = 3, because they first disagree
at the third position from the right.
Suppose the data representing the contents of the window is whatever sequence of bits
represents both w and x . Ask the query “how many 1s are in the last k bits?” The query-
answering algorithm will produce the same answer, whether the window contains w or x ,
because the algorithm can only see their representation. But the correct answers are surely
different for these two bit-strings. Thus, we have proved that we must use at least N bits to
answer queries about the last k bits for any k .
In fact, we need N bits, even if the only query we can ask is “how many 1s are in the
entire window of length N ?” The argument is similar to that used above. Suppose we use
fewer than N bits to represent the window, and therefore we can find w , x , and k as above.
It might be that w and x have the same number of 1s, as they did in both cases of Example
4.10 . However, if we follow the current window by any N k bits, we will have a situ-
ation where the true window contents resulting from w and x are identical except for the
leftmost bit, and therefore, their counts of 1s are unequal. However, since the representa-
tions of w and x are the same, the representation of the window must still be the same if we
feed the same bit sequence to these representations. Thus, we can force the answer to the
query “how many 1s in the window?” to be incorrect for one of the two possible window
contents.
4.6.2
The Datar-Gionis-Indyk-Motwani Algorithm
We shall present the simplest case of an algorithm called DGIM. This version of the al-
gorithm uses O (log 2 N ) bits to represent a window of N bits, and allows us to estimate the
number of 1s in the window with an error of no more than 50%. Later, we shall discuss an
improvement of the method that limits the error to any fraction ϵ > 0, and still uses only
O (log 2 N ) bits (although with a constant factor that grows as ϵ shrinks).
To begin, each bit of the stream has a timestamp , the position in which it arrives. The
first bit has timestamp 1, the second has timestamp 2, and so on. Since we only need to
distinguish positions within the window of length N , we shall represent timestamps modulo
N , so they can be represented by log 2 N bits. If we also store the total number of bits ever
Search WWH ::




Custom Search