Database Reference
In-Depth Information
EXAMPLE 4.8 Consider the three variables from Example 4.7 . From X 1 we derive the es-
timate n × (2 × X 1 .value − 1) = 15 × (2 × 3 − 1) = 75. The other two variables, X 2 and X 3 ,
each have value 2 at the end, so their estimates are 15 × (2 × 2 − 1) = 45. Recall that the
true value of the second moment for this stream is 59. On the other hand, the average of the
three estimates is 55, a fairly close approximation.
4.5.3
Why the Alon-Matias-Szegedy Algorithm Works
We can prove that the expected value of any variable constructed as in Section 4.5.2 is the
second moment of the stream from which it is constructed. Some notation will make the
argument easier to follow. Let e ( i ) be the stream element that appears at position i in the
stream, and let c ( i ) be the number of times element e ( i ) appears in the stream among posi-
tions i, i + 1 , . . . , n .
EXAMPLE 4.9 Consider the stream of Example 4.7 . e (6) = a , since the 6th position holds a .
Also, c (6) = 4, since a appears at positions 9, 13, and 14, as well as at position 6. Note that
a also appears at position 1, but that fact does not contribute to c (6).
The expected value of 2× X.value +1 is the average over all positions i between 1 and n
of n × (2 × c ( i ) − 1), that is
We can simplify the above by canceling factors 1/ n and n , to get
However, to make sense of the formula, we need to change the order of summation by
grouping all those positions that have the same element. For instance, concentrate on some
element a that appears m a times in the stream. The term for the last position in which a ap-
pears must be 2 × 1 − 1 = 1. The term for the next-to-last position in which a appears is 2 ×
2 − 1 = 3. The positions with a before that yield terms 5, 7, and so on, up to 2 m a − 1, which
is the term for the first position in which a appears. That is, the formula for the expected
value of 2 × X.value + 1 can be written:
Note that 1 + 3 + 5 + · · · + (2 m a − 1) = ( m a ) 2 . The proof is an easy induction on the
number of terms in the sum. Thus, E (2× X.value +1) = ∑ a ( m a ) 2 , which is the definition of
the second moment.
Search WWH ::




Custom Search