Database Reference
In-Depth Information
There is a pattern to the distribution of bucket sizes as we execute the basic algorithm of Section 4.6.5 . Think of two
buckets of size 2 j as a ”1” in position j and one bucket of size 2 j as a ”0” in that position. Then as 1s arrive in the
stream, the bucket sizes after each 1 form consecutive binary integers. The occasional long sequences of bucket com-
binations are analogous to the occasional long rippling of carries as we go from an integer like 101111 to 110000.
No matter what j is, this fraction is upper bounded by 1/( r −1). Thus, by picking r suffi-
ciently large, we can limit the error to any desired ϵ > 0.
4.6.7
Extensions to the Counting of Ones
It is natural to ask whether we can extend the technique of this section to handle aggrega-
tions more general than counting 1s in a binary stream. An obvious direction to look is to
consider streams of integers and ask if we can estimate the sum of the last k integers for
any 1 ≤ k N , where N , as usual, is the window size.
It is unlikely that we can use the DGIM approach to streams containing both positive
and negative integers. We could have a stream containing both very large positive integers
and very large negative integers, but with a sum in the window that is very close to 0. Any
imprecision in estimating the values of these large integers would have a huge effect on the
estimate of the sum, and so the fractional error could be unbounded.
For example, suppose we broke the stream into buckets as we have done, but represented
the bucket by the sum of the integers therein, rather than the count of 1s. If b is the bucket
that is partially within the query range, it could be that b has, in its first half, very large
negative integers and in its second half, equally large positive integers, with a sum of 0. If
we estimate the contribution of b by half its sum, that contribution is essentially 0. But the
actual contribution of that part of bucket b that is in the query range could be anything from
0 to the sum of all the positive integers. This difference could be far greater than the actual
query answer, and so the estimate would be meaningless.
On the other hand, some other extensions involving integers do work. Suppose that the
stream consists of only positive integers in the range 1 to 2 m for some m . We can treat each
of the m bits of each integer as if it were a separate stream.
We then use the DGIM method to count the 1s in each bit. Suppose the count of the i th
bit (assuming bits count from the low-order end, starting at 0) is c i . Then the sum of the
integers is
Search WWH ::




Custom Search