Database Reference
In-Depth Information
} else if(r > M[j]) {
Z -= Math. pow (2, -M[j]);
Z += Math. pow (2, -r);
M[j] = (byte)r;
return true;
}
return false;
} catch(IOException ex) {
return false;
}
}
Finally, the implementation of the size() method uses the baseline value
Z (what Flajolet calls the indicator function) to produce a corrected estimate
of the cardinality:
public int size() {
double N = alpha/Z;
if(N <= 5.0*m/2.0 && V > 0) {
return (int) (m*Math. log ((double)m/(double)V));
} else if(N > Math. pow (2, 32)/30.0){
return (int) (-Math. pow (2, 32)
*Math. log (1.0 - N/Math. pow (2, 32)));
} else
return (int)N;
}
Improvements to the HyperLogLog Algorithm
The HyperLogLog algorithm was originally designed for scenarios where
the expected cardinality of the set to be measured is on the order of a
billion. With this approximate cardinality in mind, HyperLogLog's creators
introduced a correction to the raw estimate, N, in the implementation of the
size calculation that is used as the raw estimate gets large. This correction
comes into play to account for the increasing probability of hash collisions
and fixes estimates for cardinalities in excess of 140 million or so.
Formostusers,thisrangeofcardinalityismorethansufficient.Applications
operating on the scale of a single website will see cardinalities on the order
of millions. Applications that deal with multiple websites, such as online
Search WWH ::




Custom Search