Database Reference
In-Depth Information
This uniform distribution gives the probability that an input will hash to a
value smaller than x as x/2 p . Assuming n unique hash values S 1 ,..,S n have
been observed, take X to be the smallest of these values. The probability that
X is at least as small as some value x is 1 minus the probability that all the
values of S are larger than x:
Because the distributions of all the S values are the same, this simplifies to:
Using this to compute the expected value of X in terms of n yields:
If M is then the observed minimum after inserting n elements, the Delta
Method from Chapter 9 allows for the calculation of E[1/X] as
approximately equal to 1/E[X]. Solving for n in terms of M yields:
This approximation is easy to calculate, but has an obvious problem that M
canbe0wheretheDeltaMethodcalculationisnotdefined.Additionally,the
variance of such approximations is often higher than desired. To get around
these problems, Min-Count introduces two techniques to improve the error
of the approximation as well as deal with the fact that the approximation
blows up when M gets very small.
Taking the second problem first, the algorithm deals with the problem in a
very simple way. Instead of simply recording the minimum, the algorithm
keeps track of the k smallest values and then performs its computations on
the k th smallest value, which does have a defined expectation. This requires
k more storage than the minimum, but k is usually a relatively small
number. In fact, the simple version of the algorithm works best when k = 3.
Search WWH ::




Custom Search