Database Reference
In-Depth Information
form
h
(
x
) =
ax
+
b
mod 2
k
?
4.5 Estimating Moments
In this section we consider a generalization of the problem of counting distinct elements in
a stream. The problem, called computing “moments,” involves the distribution of frequen-
cies of different elements in the stream. We shall define moments of all orders and concen-
trate on computing second moments, from which the general algorithm for all moments is
a simple extension.
4.5.1
Definition of Moments
Suppose a stream consists of elements chosen from a universal set. Assume the universal
set is ordered so we can speak of the
i
th element for any
i
. Let
m
i
be the number of occur-
rences of the
i
th element for any
i
. Then the
kth-order moment
(or just
k
th moment) of the
stream is the sum over all
i
of (
m
i
)
k
.
the 0th moment is a count of the number of distinct elements in the stream. We can use the
method of
Section 4.4
to estimate the 0th moment of a stream.
The 1st moment is the sum of the
m
i
, which must be the length of the stream. Thus, first
moments are especially easy to compute; just count the length of the stream seen so far.
The 2nd moment is the sum of the squares of the
m
i
. It is sometimes called the
surprise
number
, as it measures the unevenness the distribution of elements in the stream. To see
the distinction, suppose we have a stream of length 100, in which eleven different elements
appear. The most even distribution of these eleven elements would have one appearing 10
times and the other ten appearing 9 times each. In this case, the surprise number is 10
2
+
10 × 9
2
= 910. At the other extreme, one of the eleven elements could appear 90 times and
the other ten appear once each. Then, the surprise number would be 90
2
+ 10 × 1
2
= 8110.
□
As in
Section 4.4
, there is no problem computing moments of any order if we can afford
to keep in main memory a count for each element that appears in the stream. However, also
as in that section, if we cannot afford to use that much memory, then we need to estimate