Database Reference
In-Depth Information
reservoir[N] = (Object)arg0;
} else {
long k = ( long )Math. floor (N*rng.nextDouble());
if (k < ( long )reservoir.length) reservoir[( int )k]
= (Object)arg0;
}
N++;
return true ;
}
This is effectively the same thing that the Durstenfeld version of the
Fisher-Yates algorithm did when it moved backward throughout the array.
The proof is not given here, but this ensures that each element in the stream
has an equal probability of being in the sample at every point in the analysis.
Biased Streaming Sampling
The basic reservoir algorithm, also known as Algorithm R (the name given
in Knuth's “The Art of Computer Programming”), ensures a uniform sample
from the beginning of time until the time the stream stops. This is largely
due to its original application, which was to sample from a “stream” in the
sense of only being allowed a single pass over an otherwise finite dataset.
This makes the algorithm well suited to providing samples in a batch setting
where Map-Reduce is being used to process a finite dataset.
It is less interesting in the streaming sense used by this topic, which is
more interested in the dynamics of the sample over time. In fact, as the
sample ages, it is less and less likely that a recent data point will actually
be included in the sample (the probability that a random number between
[0,N] is smaller than n gets smaller as N increases). Instead, it would be
preferable if the algorithm could be biased toward including newer events in
preference to older events.
Sliding Window Reservoir Sampling
A simple method for introducing time dynamics into reservoir sampling is
to maintain a sliding window of samples. In this implementation, a number
of reservoir “buckets” are maintained in a linked list:
public class SlidingWindow<E> {
int n,k,max;
Search WWH ::




Custom Search