Databases Reference
In-Depth Information
F
Estimating the Number of 1's in a Window : We can estimate the number
of 1's in a window of 0's and 1's by grouping the 1's into buckets. Each
bucket has a number of 1's that is a power of 2; there are one or two
buckets of each size, and sizes never decrease as we go back in time. If
we record only the position and size of the buckets, we can represent the
contents of a window of size N with O(log 2 N ) space.
F
Answering Queries About Numbers of 1's: If we want to know the approx-
imate numbers of 1's in the most recent k elements of a binary stream,
we find the earliest bucket B that is at least partially within the last k
positions of the window and estimate the number of 1's to be the sum of
the sizes of each of the more recent buckets plus half the size of B. This
estimate can never be off by more that 50% of the true count of 1's.
F
Closer Approximations to the Number of 1's: By changing the rule for
how many buckets of a given size can exist in the representation of a
binary window, so that either r or r−1 of a given size may exist, we can
assure that the approximation to the true number of 1's is never off by
more than 1/r.
F
Exponentially Decaying Windows: Rather than fixing a window size, we
can imagine that the window consists of all the elements that ever arrived
in the stream, but with the element that arrived t time units ago weighted
by e −ct for some time-constant c. Doing so allows us to maintain certain
summaries of an exponentially decaying window easily. For instance, the
weighted sum of elements can be recomputed, when a new element arrives,
by multiplying the old sum by 1−c and then adding the new element.
F
Maintaining Frequent Elements in an Exponentially Decaying Window :
We can imagine that each item is represented by a binary stream, where
0 means the item was not the element arriving at a given time, and 1
means that it was. We can find the elements whose sum of their binary
stream is at least 1/2. When a new element arrives, multiply all recorded
sums by 1 minus the time constant, add 1 to the count of the item that
just arrived, and delete from the record any item whose sum has fallen
below 1/2.
4.9
References for Chapter 4
Many ideas associated with stream management appear in the “chronicle data
model” of [8]. An early survey of research in stream-management systems is
[2]. Also, [6] is a recent topic on the subject of stream management.
The sampling technique of Section 4.2 is from [7]. The Bloom Filter is
generally attributed to [3], although essentially the same technique appeared as
“superimposed codes” in [9].
Search WWH ::




Custom Search