Database Reference
In-Depth Information
Another approach is to maintain a single sample that will tend to contain
newer elements by replacing elements in the sample with newer ones
according to the relative age of the two elements. This section describes
one method that was presented by Charu Aggarwal (“On Biased Reservoir
Sampling in the Presence of Stream Evolution,” VLDB Conference, 2006).
This method begins with the definition of a bias rate p , which has the same
functional form as the exponential distribution. The algorithm defines a
reservoir by the inverse of the bias rate:
public class BiasedReservoir<E> {
MersenneTwister rng = new MersenneTwister();
ArrayList<E> reservoir = new ArrayList<E>();
double rate;
double size,N;
public BiasedReservoir( double rate) {
this .rate = rate;
size = Math. ceil (1.0/rate);
}
Note that the reservoir is not actually an array as it was in the original
formulation. This is because the number of elements in the array will be
allowed to vary somewhat over time. To implement the add() method,
first an element is removed with the probability proportional to how many
elements are in the reservoir versus the size. If the number of elements is
equal to the size, then an element is always removed. The element to be
inserted is then added to the array:
public void add(E obj) {
if (rng.nextDouble() < (( double )reservoir.size())/
size) {
reservoir.remove(rng.nextInt() % reservoir.size());
}
reservoir.add(obj);
}
Aggarwal is able to show that this simple algorithm produces exponentially
biased samples with a rate equal to the “size” of the reservoir. The primary
drawback of this technique is that it can take some time to fill the reservoir
Search WWH ::




Custom Search