Database Reference
In-Depth Information
point estimates all have some error. Adding them together, as discussed in
Chapter 9, means that their errors increase with each addition. In the best
case, you'd expect the error of that median calculation to be 1,500 times
larger than any single point estimate.
A less naïve solution is to find some way to reduce the total number of
point queries required to compute the frequency of the range. One approach
is to use multiple Count-Min sketches that store all the data at different
resolutions. The first sketch is the basic sketch that stores the point queries.
The second sketch halves the resolution by combining the first and second
elements into one counter, and so on. The third sketch combines the first
twoelementsofthesecondsketchsoitsfirstelementisthecountofthefirst,
second, third, and fourth elements in the original sketch. The resolution is
halved until the last sketch contains only two elements.
Doing this allows any range query to be divided into at most 2×lg(N)
intervals, where N is the total size of the domain. For example, if the domain
is the space of 32-bit integers, then at most 64 subintervals are needed to
compute any range query. Compared to the O(n) compute time of the naïve
implementation, this is quite an improvement.
Dyadic Intervals
The intervals described in the last section are known as dyadic intervals .
Dyadic intervals are defined by two parameters: a level parameter and a
start parameter:
public class DyadicInterval {
public int level = 0;
public int start = 0;
public DyadicInterval() { }
public DyadicInterval(int level,int start) {
this.level = level;
this.start = start;
}
These two parameters define the start and end points of the interval as
[2 level ×start, 2 level ×(start+1) - 1]:
public int min() { return (1 << level)*start; }
public int max() { return (1 << level)*(start+1) - 1; }
Search WWH ::




Custom Search