Databases Reference
In-Depth Information
Our estimate of the number of 1's in the last ten positions is thus 6. This
number is the two buckets of size 1, the bucket of size 2, and half the bucket of
size 4 that is partially within range. Of course the correct answer is 5.
2
Suppose the above estimate of the answer to a query involves a bucket b
of size 2 j that is partially within the range of the query. Let us consider how
far from the correct answer c our estimate could be. There are two cases: the
estimate could be larger or smaller than c.
Case 1 : The estimate is less than c. In the worst case, all the 1's of b are
actually within the range of the query, so the estimate misses half bucket b, or
2 j−1 1's. But in this case, c is at least 2 j ; in fact it is at least 2 j+1 −1, since
there is at least one bucket of each of the sizes 2 j−1 , 2 j−2 , . . . , 1. We conclude
that our estimate is at least 50% of c.
Case 2 : The estimate is greater than c. In the worst case, only the rightmost
bit of bucket b is within range, and there is only one bucket of each of the sizes
smaller than b. Then c = 1 + 2 j−1 + 2 j−2 ++ 1 = 2 j and the estimate we
give is 2 j−1 + 2 j−1 + 2 j−2 ++ 1 = 2 j + 2 j−1 −1. We see that the estimate
is no more than 50% greater than c.
4.6.5
Maintaining the DGIM Conditions
Suppose we have a window of length N properly represented by buckets that
satisfy the DGIM conditions. When a new bit comes in, we may need to modify
the buckets, so they continue to represent the window and continue to satisfy
the DGIM conditions. First, whenever a new bit enters:
•Check the leftmost (earliest) bucket. If its timestamp has now reached
the current timestamp minus N , then this bucket no longer has any of its
1's in the window. Therefore, drop it from the list of buckets.
Now, we must consider whether the new bit is 0 or 1. If it is 0, then no
further change to the buckets is needed. If the new bit is a 1, however, we may
need to make several changes. First:
•Create a new bucket with the current timestamp and size 1.
If there was only one bucket of size 1, then nothing more needs to be done.
However, if there are now three buckets of size 1, that is one too many. We fix
this problem by combining the leftmost (earliest) two buckets of size 1.
•To combine any two adjacent buckets of the same size, replace them by
one bucket of twice the size. The timestamp of the new bucket is the
timestamp of the rightmost (later in time) of the two buckets.
Combining two buckets of size 1 may create a third bucket of size 2. If so,
we combine the leftmost two buckets of size 2 into a bucket of size 4. That, in
turn, may create a third bucket of size 4, and if so we combine the leftmost two
Search WWH ::




Custom Search