Database Reference
In-Depth Information
collision of one hash function might be too high, the probability that several
hashes will collide is small enough to be useful.
Clearly, this approach tends to overestimate the approximate frequency of
a value under the Cash Register model. Another element could (and will,
when enough elements are entered) increment the same counter, much like
the Bloom Filter. So, how badly does the Count-Min sketch overestimate
this count?
In the original Count-Min paper, the authors show that the probability that
the estimate of an element is between its true value x and an upper bound
x + em, where m is the number of elements entered into a map, is larger
than 1-d under two conditions. The first condition is that the width of the
sketch is 2/e. The second condition is that the depth of the sketch is (log
1/d)/log 2. So, a Count-Min sketch where the estimate is within 5 percent
of the sum with a 99 percent probability would have a width of 40 and a
depth of 7. A depth of 8 with a width of 128 would have a relative error of
approximately 1.5 percent with a probability of approximately 99.6 percent.
Using 32-bit counters, the Count-Min sketch requires the same amount of
space as a HyperLogLog sketch with b=12.
Count-Min Sketch Implementation
Ratherthanimplementing a Set<E> ,theCount-Min sketchimplements the
Java class Map<E,Long> albeit with some restrictions.
public class CountMinSketch<E extends Serializable>
implements Map<E,Long> {
int width = 0;
int depth = 0;
byte[] m;
SerializableHasher[] hashes;
ObjectOutputStream[] outputs;
protected void initialize(int m,int[] seeds) throws
IOException {
hashes = new SerializableHasher[seeds.length];
outputs= new ObjectOutputStream[seeds.length];
for(int i=0;i<seeds.length;i++) {
hashes[i] = (new
SerializableHasher()).seed(seeds[i]);
Search WWH ::




Custom Search