Database Reference
In-Depth Information
(1) Why the number of buckets representing a window must be small.
(2) How to estimate the number of 1s in the last k bits for any k , with an error no greater
than 50%.
(3) How to maintain the DGIM conditions as new bits enter the stream.
4.6.3
Storage Requirements for the DGIM Algorithm
We observed that each bucket can be represented by O (log N ) bits. If the window has length
N , then there are no more than N 1s, surely. Suppose the largest bucket is of size 2 j . Then j
cannot exceed log 2 N , or else there are more 1s in this bucket than there are 1s in the entire
window. Thus, there are at most two buckets of all sizes from log 2 N down to 1, and no
buckets of larger sizes.
We conclude that there are O (log N ) buckets. Since each bucket can be represented in
O (log N ) bits, the total space required for all the buckets representing a window of size N
is O (log 2 N ).
4.6.4
Query Answering in the DGIM Algorithm
Suppose we are asked how many 1s there are in the last k bits of the window, for some 1
k N . Find the bucket b with the earliest timestamp that includes at least some of the k
most recent bits. Estimate the number of 1s to be the sum of the sizes of all the buckets to
the right (more recent) than bucket b , plus half the size of b itself.
EXAMPLE 4.12 Suppose the stream is that of Fig. 4.2 , and k = 10. Then the query asks for
the number of 1s in the ten rightmost bits, which happen to be 0110010110 . Let the cur-
rent timestamp (time of the rightmost bit) be t . Then the two buckets with one 1, having
timestamps t − 1 and t − 2 are completely included in the answer. The bucket of size 2,
with timestamp t − 4, is also completely included. However, the rightmost bucket of size
4, with timestamp t − 8 is only partly included. We know it is the last bucket to contribute
to the answer, because the next bucket to its left has timestamp less than t − 9 and thus
is completely out of the window. On the other hand, we know the buckets to its right are
completely inside the range of the query because of the existence of a bucket to their left
with timestamp t − 9 or greater.
Our estimate of the number of 1s in the last ten positions is thus 6. This number is the
two buckets of size 1, the bucket of size 2, and half the bucket of size 4 that is partially
within range. Of course the correct answer is 5.
Search WWH ::




Custom Search