Database Reference
In-Depth Information
the probability of a false positive , as a function of n , the bit-array length, m the number of
members of S , and k , the number of hash functions.
The model to use is throwing darts at targets. Suppose we have x targets and y darts. Any
dart is equally likely to hit any target. After throwing the darts, how many targets can we
expect to be hit at least once? The analysis is similar to the analysis in Section 3.4.2 , and
goes as follows:
• The probability that a given dart will not hit a given target is ( x − 1)/ x .
• The probability that none of the y darts will hit a given target is
We can
write this expression as
• Using the approximation (1 − ϵ ) 1/ ϵ = 1/ e for small ϵ (recall Section 1.3.5 ), we con-
clude that the probability that none of the y darts hit a given target is e −y / x .
EXAMPLE 4.3 Consider the running example of Section 4.3.1 . We can use the above calcu-
lation to get the true expected number of 1s in the bit array. Think of each bit as a target,
and each member of S as a dart. Then the probability that a given bit will be 1 is the prob-
ability that the corresponding target will be hit by one or more darts. Since there are one
billion members of S , we have y = 10 9 darts. As there are eight billion bits, there are x =
8 × 10 9 targets. Thus, the probability that a given target is not hit is e −y / x = e 1/8 and the
probability that it is hit is 1 − e 1/8 . That quantity is about 0.1175. In Section 4.3.1 we sug-
gested that 1/8 = 0.125 is a good approximation, which it is, but now we have the exact
calculation.
We can apply the rule to the more general situation, where set S has m members, the
array has n bits, and there are k hash functions. The number of targets is x = n , and the
number of darts is y = km . Thus, the probability that a bit remains 0 is e k m / n . We want the
fraction of 0 bits to be fairly large, or else the probability that a nonmember of S will hash
at least once to a 0 becomes too small, and there are too many false positives. For example,
we might choose k , the number of hash functions to be n / m or less. Then the probability of
a 0 is at least e 1 or 37%. In general, the probability of a false positive is the probability of
a 1 bit, which is 1 − e km / n , raised to the k th power, i.e., (1 − e km / n ) k .
EXAMPLE 4.4 In Example 4.3 we found that the fraction of 1s in the array of our running
example is 0.1175, and this fraction is also the probability of a false positive. That is, a non-
member of S will pass through the filter if it hashes to a 1, and the probability of it doing
so is 0.1175.
Suppose we used the same S and the same array, but used two different hash functions.
This situation corresponds to throwing two billion darts at eight billion targets, and the
Search WWH ::




Custom Search