Databases Reference
In-Depth Information
of filters, each of which would eliminate 7/8th of the remaining spam.
4.3.2 The Bloom Filter
A Bloom filter consists of:
1. An array of n bits, initially all 0's.
2. A collection of hash functions h 1 , h 2 , . . . , h k . Each hash function maps
“key” values to n buckets, corresponding to the n bits of the bit-array.
3. A set S of m key values.
The purpose of the Bloom filter is to allow through all stream elements whose
keys are in S, while rejecting most of the stream elements whose keys are not
in S.
To initialize the bit array, begin with all bits 0. Take each key value in S
and hash it using each of the k hash functions. Set to 1 each bit that is h i (K)
for some hash function h i and some key value K in S.
To test a key K that arrives in the stream, check that all of
h 1 (K), h 2 (K), . . . , h k (K)
are 1's in the bit-array. If all are 1's, then let the stream element through. If
one or more of these bits are 0, then K could not be in S, so reject the stream
element.
4.3.3 Analysis of Bloom Filtering
If a key value is in S, then the element will surely pass through the Bloom
filter. However, if the key value is not in S, it might still pass. We need to
understand how to calculate the probability of a false positive, as a function of
n, the bit-array length, m the number of members of S, and k, the number of
hash functions.
The model to use is throwing darts at targets. Suppose we have x targets
and y darts. Any dart is equally likely to hit any target. After throwing the
darts, how many targets can we expect to be hit at least once? The analysis is
similar to the analysis in Section 3.4.2, and goes as follows:
•The probability that a given dart will not hit a given target is (x−1)/x.
y .
x−1
x
•The probability that none of the y darts will hit a given target is
We can write this expression as (1− x ) x( x ) .
•Using the approximation (1−ǫ) 1/ǫ = 1/e for small ǫ (recall Section 1.3.5),
we conclude that the probability that none of the y darts hit a given target
is e −y/x .
Search WWH ::




Custom Search