Database Reference
In-Depth Information
counters[j] = C;
} catch(IOException e) {
}
}
return true;
}
The calculation of set membership and estimates of the size of the filter
remain the same.
The downside of this approach is that it introduces the ability to produce
false negatives. To minimize the probability of a false negative in practice,
the authors of the original paper conducted a number of empirical
experiments using different values of d and C . The general rule of thumb
is to keep C as small as possible, because the larger C gets the larger d
needs to be. This introduces more steps in each update and slows the overall
algorithm. After a size for the counters has been established, the number of
elements to decrement is set via empirical testing of the sketch.
Distinct Value Sketches
The Bloom Filter is a fairly flexible algorithm for approximating set
membership and cardinality. To only approximate the cardinality of the set
it is possible to improve the Bloom Filter's error while decreasing memory
requirements using a specific Distinct Value (DV) Sketch. A variety of DV
sketch algorithms have been developed over the years. The various
approaches usually take on one of two different flavors, and this section
discusses one of each type.
The Min-Count Algorithm
The first type category, represented by the Min-Count algorithm (Frédéric
Giroire, “Order Statistics and Estimating Cardinalities of massive Data
Sets,” Discrete Applied Mathematics, 2009, Vol 157 Issue 2 (January),
406-427), relies on the distribution of order statistics. To understand how
this works, recall the earlier discussion of hash functions. A good hash
function takes an input value and maps it into a space of 2 p values such that
the distribution of values is uniform in [0,2 p -1].
Search WWH ::




Custom Search