Database Reference
In-Depth Information
the k th moment by keeping a limited number of values in main memory and computing
an estimate from these values. For the case of distinct elements, each of these values were
counts of the longest tail produced by a single hash function. We shall see another form of
value that is useful for second and higher moments.
4.5.2
The Alon-Matias-Szegedy Algorithm for Second
Moments
For now, let us assume that a stream has a particular length n . We shall show how to deal
with growing streams in the next section. Suppose we do not have enough space to count
all the m i for all the elements of the stream. We can still estimate the second moment of
the stream using a limited amount of space; the more space we use, the more accurate the
estimate will be. We compute some number of variables . For each variable X , we store:
(1) A particular element of the universal set, which we refer to as X.element , and
(2) An integer X.value , which is the value of the variable. To determine the value of a vari-
able X , we choose a position in the stream between 1 and n , uniformly and at random.
Set X.element to be the element found there, and initialize X.value to 1. As we read the
stream, add 1 to X.value each time we encounter another occurrence of X.element .
EXAMPLE 4.7 Suppose the stream is a, b, c, b, d, a, c, d, a, b, d, c, a, a, b . The length of
the stream is n = 15. Since a appears 5 times, b appears 4 times, and c and d appear three
times each, the second moment for the stream is 5 2 +4 2 +3 2 +3 2 = 59. Suppose we keep
three variables, X 1 , X 2 , and X 3 . Also, assume that at “random” we pick the 3rd, 8th, and
13th positions to define these three variables.
When we reach position 3, we find element c , so we set X 1 .element = c and X 1 .value = 1.
Position 4 holds b , so we do not change X 1 . Likewise, nothing happens at positions 5 or 6.
At position 7, we see c again, so we set X 1 .value = 2.
At position 8 we find d , and so set X 2 .element = d and X 2 .value = 1. Positions 9 and 10
hold a and b , so they do not affect X 1 or X 2 . Position 11 holds d so we set X 2 .value = 2, and
position 12 holds c so we set X 1 .value = 3. At position 13, we find element a , and so set
X 3 .element = a and X 3 .value = 1. Then, at position 14 we see another a and so set X 3 .value
= 2. Position 15, with element b does not affect any of the variables, so we are done, with
final values X 1 .value = 3 and X 2 .value = X 3 .value = 2.
We can derive an estimate of the second moment from any variable X . This estimate is n
× (2 × X.value − 1).
Search WWH ::




Custom Search