Database Reference
In-Depth Information
NOTE
Most of the sketches considered in this chapter only work when items
are added to the sketch; they do not support removing items from the
sketch. The authors of the original Count-Min paper call this a Cash
Register model and it assumes that all frequencies are non-negative.
The Count-Min sketch supports this model, but also supports what the
authors call the Turnstile model, in which items can be subtracted from
the set so values could potentially be negative. This chapter is primarily
concerned with Cash Register models because they are more common
in streaming data, but the Turnstile model is also of interest for the
Count-Min sketch.
The implementation of the Count-Min sketch is quite similar to that of the
Counting Bloom Filter. In fact, an extension of the Counting Bloom Filter
called the Spectral Bloom Filter has been shown to be equivalent to the
Count-Min sketch. Like the Counting Bloom Filter, the Count-Min sketch
makes use of an array of counters for its registers. The primary difference
is that each of the k hash functions receives its own set of m registers. This
defines a two-dimensional matrix of register values where a hash function
only updates a particular row. The value of k is commonly considered to be
the “depth” of the sketch whereas the choice of m is considered to be the
“width” of the sketch.
Adding an element to the sketch is very similar to the Counting Bloom Filter
as well. The element is hashed by each hash function h(x), which is then
mappedintomusingtheusualmodulustoidentify register j.Foreachofthe
hash functions, the algorithm increments the register at m ij .
Point Queries
Finding the approximate frequency of an element is also very
straightforward. After applying each of the hash functions, the algorithm
computes min(m 1j ,…,m kj ) and considers this to be the approximate number
of times this element has been added into the set. Although it's somewhat
surprising that this works well enough to use in practice, it acts much like
other sketches in that it relies on the fact that although the probability of a
Search WWH ::




Custom Search