Statistical Approximation of Streaming Data - Real-Time Analytics

Database Reference

In-Depth Information

The Reservoir Algorithm

Reservoir algorithms are a class of algorithms developed to sample from

streaming populations. They are closely related in concept to the

Fisher-Yatesshufflefromthelastsection.Thisismosteasilyseeninanearly

version of the Fisher-Yates shuffle:

public static <E> void durstenfeldPermute(E[] array) {

for ( int i=array.length-1;i>0;i--) {

int j = ( int ) Math . floor (i*rng.nextDouble());

E temp = array[i];array[i] = array[j];array[j] =

temp;

}

Inthisversionofthealgorithm,theindexisshuffledfrombacktofront.This

means that elements from farther out in the array are swapped into earlier

sections. It is this notion that leads to the basic reservoir algorithm.

In the reservoir algorithm, an array of elements that form the reservoir is

first allocated. This takes the role of the first n elements of the array when

sampling without replacement:

public class Reservoir<E> implements List<E> {

MersenneTwister rng = new MersenneTwister();

Object[] reservoir;

int N;

public Reservoir( int n) {

super ();

reservoir = new Object[n];

N = 0;

}

The first reservoir.length elements can simply be added to the array.

After that, a random position is drawn between N and 0. If that position

falls into the array, it replaces the element currently there; otherwise, it is

ignored:

public boolean add(E arg0) {

if (N < reservoir.length) {

Search WWH ::

Custom Search

Home