Databases Reference
In-Depth Information
at 0) is c i . Then the sum of the integers is
m−1
c i 2 i
i=0
If we use the technique of Section 4.6.6 to estimate each c i with fractional error
at most ǫ, then the estimate of the true sum has error at most ǫ. The worst
case occurs when all the c i 's are overestimated or all are underestimated by the
same fraction.
4.6.8
Exercises for Section 4.6
Exercise 4.6.1 : Suppose the window is as shown in Fig. 4.2. Estimate the
number of 1's the the last k positions, for k = (a) 5 (b) 15. In each case, how
far off the correct value is your estimate?
! Exercise 4.6.2 : There are several ways that the bit-stream 1001011011101
could be partitioned into buckets. Find all of them.
Exercise 4.6.3 : Describe what happens to the buckets if three more 1's enter
the window represented by Fig. 4.3.
You may assume none of the 1's shown
leave the window.
4.7
Decaying Windows
We have assumed that a sliding window held a certain tail of the stream, either
the most recent N elements for fixed N , or all the elements that arrived after
some time in the past. Sometimes we do not want to make a sharp distinction
between recent elements and those in the distant past, but want to weight
the recent elements more heavily. In this section, we consider “exponentially
decaying windows,” and an application where they are quite useful: finding the
most common “recent” elements.
4.7.1 The Problem of Most-Common Elements
Suppose we have a stream whose elements are the movie tickets purchased all
over the world, with the name of the movie as part of the element. We want
to keep a summary of the stream that is the most popular movies “currently.”
While the notion of “currently” is imprecise, intuitively, we want to discount
the popularity of a movie like Star Wars-Episode 4, which sold many tickets,
but most of these were sold decades ago. On the other hand, a movie that sold
n tickets in each of the last 10 weeks is probably more popular than a movie
that sold 2n tickets last week but nothing in previous weeks.
One solution would be to imagine a bit stream for each movie, and give it
value 1 if the ticket is for that movie, and 0 otherwise.
Pick a window size
Search WWH ::




Custom Search