Databases Reference
In-Depth Information
A General Stream-Sampling Problem
Notice that the technique described in Section 4.5.5 actually solves a more
general problem. It gives us a way to maintain a sample of s stream
elements so that at all times, all stream elements are equally likely to be
selected for the sample.
As an example of where this technique can be useful, recall that in
Section 4.2 we arranged to select all the tuples of a stream having key
value in a randomly selected subset. Suppose that, as time goes on, there
are too many tuples associated with any one key. We can arrange to limit
the number of tuples for any key K to a fixed constant s by using the
technique of Section 4.5.5 whenever a new tuple for key K arrives.
! Exercise 4.5.2 : If a stream has n elements, of which m are distinct, what are
the minimum and maximum possible surprise number, as a function of m and
n?
Exercise 4.5.3 : Suppose we are given the stream of Exercise 4.5.1, to which
we apply the Alon-Matias-Szegedy Algorithm to estimate the surprise number.
For each possible value of i, if X i is a variable starting position i, what is the
value of X i .value?
Exercise 4.5.4 : Repeat Exercise 4.7 if the intent of the variables is to compute
third moments. What is the value of each variable at the end? What estimate
of the third moment do you get from each variable? How does the average of
these estimates compare with the true value of the third moment?
Exercise 4.5.5 : Prove by induction on n that 1 + 3 + 5 ++ (2m−1) = m 2 .
Exercise 4.5.6 : If we wanted to compute fourth moments, how would we
convert X.value to an estimate of the fourth moment?
4.6
Counting Ones in a Window
We now turn our attention to counting problems for streams. Suppose we have
a window of length N on a binary stream. We want at all times to be able to
answer queries of the form “how many 1's are there in the last k bits?” for any
k≤N . As in previous sections, we focus on the situation where we cannot
afford to store the entire window. After showing an approximate algorithm for
the binary case, we discuss how this idea can be extended to summing numbers.
Search WWH ::




Custom Search