Database Reference
In-Depth Information
pen-and-paper method in the late 1930s. It is very simple to implement as
an in-place shuffling method:
public static <E> void permute(E[] array) {
for ( int i=0;i<array.length;i++) {
int j = i + ( int )Math. floor ((array.length -
i)*rng.nextDouble());
E temp = array[i];array[i] = array[j];array[j] =
temp;
}
}
This method will produce all n! possible permutations of the array with
equal probability. In the preceding algorithm, a value will only be swapped
no more than once, so a sampler without replacement can be implemented
by running the shuffle algorithm over the first n elements and returning the
first n elements to use as the sample:
public static <E> E[] withoutReplacement(E[] array, int
n) {
for ( int i=0;i<n;i++) {
int j = i
+ ( int )Math. floor ((array.length - i)*rng.nextDouble());
E temp = array[i];array[i] = array[j];array[j] =
temp;
}
return Arrays. copyOf (array, n);
}
Sampling from a Streaming Population
The Fisher-Yates shuffle works well when the total size of the population is
known and fits into an array held in RAM. When the array cannot be held in
RAM, other techniques were developed to allow for sampling from the data
in a single pass. These methods form the basis of sampling from a streaming
population where the complete dataset is somewhere between “very large”
and “infinite.”
Search WWH ::




Custom Search