Database Reference
In-Depth Information
Cardinality Estimation
Although there are more efficient methods of maintaining an estimate of a
set's cardinality, the Bloom Filter can also approximate cardinality.
The basic trick is to go back to the probability that a particular bit is set to
1. When choosing an optimal filter size and number of hash functions, the
probability that a bit was set was p = 1-e -kn/m .
Assuming that the probability of each bit being set to 1 is independent,
the register array can be considered m independent, identically distributed
Bernoulli trials. This is the same as flipping a coin m times and counting
how many coins come up heads.
After inserting n elements, the probability of a 1 can be estimated by the
number of 1 bits x divided by the total number of bits m. Plugging x/m into
the equation and solving for n yields:
A slightly better estimate can be obtained by not using the approximation
from the earlier analysis. This yields a somewhat more complicated
expression for n:
However, practically speaking, the difference between -m/k and 1/k ×
log(1-1/m) is going to be quite small.
The naïve implementation of the size() method of the Bloom Filter Set
simply uses BitSet 's native cardinality() method:
public int size() {
int X = bits.cardinality();
return (int) (Math. log (1 - (double)X/
(double)bits.length())/
(Math. log (1.0 - 1.0/
(double)bits.length())*(double)hashes.length));
}
Search WWH ::




Custom Search