Database Reference
In-Depth Information
count += sketches.get(d.level).get(d.start);
}
return count;
}
The computation of quantiles, such as the median, is accomplished via a
binary search of the space. It is unlikely that the exact probability is in the
set, but the algorithm returns the closest value to the appropriate quantile.
public int quantile(double q){
if(q == 0.0) return min;
if(q == 1.0) return max;
int lo = min;
int hi = max;
while(lo <= hi) {
int mid = lo + (hi - lo)/2;
double val = frequency(min,mid);
if(val < q) hi = mid - 1;
else if(val > q) lo = mid + 1;
else return mid;
}
return lo; //Return the nearest value
}
Other Applications
This chapter provides an introduction to a class of dimension reduction
tools with similar properties. Those properties are, essentially, storage and
updates that do not directly depend on the number of elements to be
considered. This allows for streaming applications to maintain control over
processing time, which is the key to high-performance, low-latency
applications.
Although the focus in this topic is on streaming applications, these
algorithms are also generally applicable. In particular, these data structures
are often useful in Map-Reduce applications where the data coming into the
map and reduce phases behaves very much like streaming data.
For example, the Bloom Filter is often used in filtering applications to do a
rough filtering of data at the Map step before doing the final trimming in
Search WWH ::




Custom Search