Database Reference
In-Depth Information
! EXERCISE 4.4.2 Do you see any problems with the choice of hash functions in Exercise
4.4.1 ? What advice could you give someone who was going to use a hash function of the
form h ( x ) = ax + b mod 2 k ?
4.5 Estimating Moments
In this section we consider a generalization of the problem of counting distinct elements in
a stream. The problem, called computing “moments,” involves the distribution of frequen-
cies of different elements in the stream. We shall define moments of all orders and concen-
trate on computing second moments, from which the general algorithm for all moments is
a simple extension.
4.5.1
Definition of Moments
Suppose a stream consists of elements chosen from a universal set. Assume the universal
set is ordered so we can speak of the i th element for any i . Let m i be the number of occur-
rences of the i th element for any i . Then the kth-order moment (or just k th moment) of the
stream is the sum over all i of ( m i ) k .
EXAMPLE 4.6 The 0th moment is the sum of 1 for each m i that is greater than 0. 4 That is,
the 0th moment is a count of the number of distinct elements in the stream. We can use the
method of Section 4.4 to estimate the 0th moment of a stream.
The 1st moment is the sum of the m i , which must be the length of the stream. Thus, first
moments are especially easy to compute; just count the length of the stream seen so far.
The 2nd moment is the sum of the squares of the m i . It is sometimes called the surprise
number , as it measures the unevenness the distribution of elements in the stream. To see
the distinction, suppose we have a stream of length 100, in which eleven different elements
appear. The most even distribution of these eleven elements would have one appearing 10
times and the other ten appearing 9 times each. In this case, the surprise number is 10 2 +
10 × 9 2 = 910. At the other extreme, one of the eleven elements could appear 90 times and
the other ten appear once each. Then, the surprise number would be 90 2 + 10 × 1 2 = 8110.
As in Section 4.4 , there is no problem computing moments of any order if we can afford
to keep in main memory a count for each element that appears in the stream. However, also
as in that section, if we cannot afford to use that much memory, then we need to estimate
Search WWH ::




Custom Search