Database Reference
In-Depth Information
Top-K and “Heavy Hitters”
AcommonCount-Minapplicationismaintaininglistsoffrequentitems.The
two basic forms of this list are the top-K list and the so-called Heavy Hitters
list. The former is simply a list of the k most common items in the data
stream, whereas the latter are the items with frequencies higher than some
predetermined value f.
The basic implementation of either sort uses a Count-Min sketch to store
the frequencies and a heap structure to hold the top values. In Java, a
suitable heap can be implemented using a PriorityQueue with a suitable
Comparator implementation, as in this example:
public class SketchComparator<T extends
Serializable> implements Comparator<T> {
CountMinSketch<T> sketch;
public SketchComparator(CountMinSketch<T> sketch) {
this.sketch = sketch;
}
public int compare(T o1, T o2) {
return (int)(sketch.get(o1) - sketch.get(o2));
}
}
To implement a top-K list, all you need is to increment an incoming element
and then add it to the priority queue. If the queue is larger than k (it can be
at most k+1 elements), remove the smallest element to return it to size k:
public class TopList<E extends Serializable> {
CountMinSketch<E> sketch;
PriorityQueue<E> heap;
int k = Integer. MAX_VALUE ;
public TopList(int k,CountMinSketch<E> sketch) {
this.sketch = sketch;
this.k = k;
this.heap = new PriorityQueue<E>(k+1,new
SketchComparator<E>(sketch));
}
public void add(E element) {
sketch.increment(element);
Search WWH ::




Custom Search