Databases Reference
In-Depth Information
N , which is the number of most recent tickets that would be considered in
evaluating popularity. Then, use the method of Section 4.6 to estimate the
number of tickets for each movie, and rank movies by their estimated counts.
This technique might work for movies, because there are only thousands of
movies, but it would fail if we were instead recording the popularity of items
sold at Amazon, or the rate at which different Twitter-users tweet, because
there are too many Amazon products and too many tweeters. Further, it only
offers approximate answers.
4.7.2 Definition of the Decaying Window
An alternative approach is to redefine the question so that we are not asking
for a count of 1's in a window. Rather, let us compute a smooth aggregation of
all the 1's ever seen in the stream, with decaying weights, so the further back
in the stream, the less weight is given. Formally, let a stream currently consist
of the elements a 1 , a 2 , . . . , a t , where a 1 is the first element to arrive and a t is
the current element. Let c be a small constant, such as 10 −6 or 10 −9 . Define
the exponentially decaying window for this stream to be the sum
t−1
a t−i (1−c) i
i=0
The effect of this definition is to spread out the weights of the stream el-
ements as far back in time as the stream goes. In contrast, a fixed window
with the same sum of the weights, 1/c, would put equal weight 1 on each of the
most recent 1/c elements to arrive and weight 0 on all previous elements. The
distinction is suggested by Fig. 4.4.
Window of
length 1/c
Figure 4.4: A decaying window and a fixed-length window of equal weight
It is much easier to adjust the sum in an exponentially decaying window
than in a sliding window of fixed length. In the sliding window, we have to
worry about the element that falls out of the window each time a new element
arrives. That forces us to keep the exact elements along with the sum, or to use
an approximation scheme such as DGIM. However, when a new element a t+1
arrives at the stream input, all we need to do is:
1. Multiply the current sum by 1−c.
Search WWH ::




Custom Search