Database Reference
In-Depth Information
bits-per-element c, and then compute the size of the bit array as well as the
number of hash functions from that. The total size of the array given a target
number of elements is m = cn. The number of hash functions to use is k =
c × ln 2. The expected false positive rate can be quickly approximated as p
= 0.6185 c . For example, 8 bits per element yields k = 6 and an approximate
false positive rate of 2.14 percent.
This may not seem like much improvement, but in real-world applications it
can be enormous. For example, domain names can be at most 63 characters
long with 3 characters as the practical minimum length. Anecdotal evidence
suggests that domain names have a median length of about 10 characters.
An application that needs to track a million domain names would need
somewhere around 10MB of storage. A Bloom Filter to track those same
million domains with an error smaller than 2.5 percent would require less
than 1MB.
Unions and Intersections
Bloom Filters have the nice property that, if they have been constructed
using the same hash functions, it is possible to construct unions and
intersections of the two filters using bitwise OR and bitwise AND operators
respectively.
In the case of the union, this operation is exactly equivalent to constructing
the set directly with the same false positive rates as if the filter had been
constructed directly from the union of the original sets. This is not
surprising since the jth operation is essentially the union of a Bloom Filter
containing a single element with the existing Bloom Filter containing the
previousj-1inserts.Thisisconvenientasitallowsfilterstobeconstructedin
a distributed fashion and later combined through the exchange of the much
smaller filters.
The intersection of two Bloom Filters presents a more complicated scenario.
In this case, the intersection of the two filters will have a false positive rate
that is at most the largest of the false positive probabilities of the original
Bloom Filters, which may be larger than the false positive probability of the
intersection had it been constructed directly from the intersection of the
original sets.
Search WWH ::




Custom Search