Embedded GeoComputation - GeoComputation

Geoscience Reference

In-Depth Information

TABLE 17.3

Demonstration of Seed Control in R

> set.seed(123,kind = “Mersenne-Twister”)

> # Generate some uniform random numbers with seed as above

> runif(4)

[1] 0.2875775 0.7883051 0.4089769 0.8830174

> runif(4)

[1] 0.9404673 0.0455565 0.5281055 0.8924190

> set.seed(123,kind = “Mersenne—Twister”)

> # Resetting seed should reproduce the first set of numbers

> runif(4)

[1] 0.2875775 0.7883051 0.4089769 0.8830174

were separated, re-processing the Rnw file would recreate the cache. Another related approach is

cacheSweave (Peng, 2010) - an R package providing a number of tools for caching results when

using R in conjunction with Sweave .

Another issue that affects reproducibility in terms of computation occurs when working with

simulation-based studies. This is the use of pseudo-random numbers. Unless the software being

used gives explicit control of the random number generation method and specification of a seed,

distinct runs of the same code will give different results. Fortunately, in R such control is possible

via the set.seed function. This function specifies the seed of the pseudo-random number genera-

tor and also the algorithm used for random number generation. An example is given in Table 17.3.

Here, the numerical seed for the generator is 123, and the algorithm used is the Mersenne twister

(Matsumoto and Nishimura, 1998). After initially setting up the generator, two sets of four uniform

random numbers in the range [0,1] are produced by calling runif(4) . After this, the generator is

re-initiated with the same seed. Calling runif(4) after re-seeding to the same value to obtain a

further four random numbers gives the same result as the first set of four in the earlier call.

Reproducibility here is important: for example, one may wish to test whether the result in

a simulation-based analysis may be an artefact of the choice of random number generator or

of the choice of seed. If this information is embedded in an Rnw file, it is then possible, with

minor edits, to test for stability of the results to such choices. In Van Niel and Laffan (2003),

for example, the importance of this is demonstrated in a GC context by considering the effect

of changing the random number generator when considering the effect of random perturbations

to a digital elevation model, when slope and flow accumulation are estimated, and conclude by

outlining the importance of reporting the choice of algorithm and seed values when carrying

out studies of this kind.

A number of further issues relate to reproducibility when using pseudo-random numbers. One

problem with using, for example, Microsoft Excel 2007 when working with random numbers is

that there is no means of specifying the seed for the random number generator - it is therefore

not possible to exactly reproduce simulations in the way set out in the aforementioned example.

A further issue - and perhaps indicative of a far wider issue - is the availability of the source code

used to implement the pseudo-random number generating algorithm. Again, Excel 2007 has to be

considered here as an example. McCullough (2008) considered this application's random number

generator and found a number of issues. In particular, although it is claimed that the generator used

in Excel 2007 is the Wichmann and Hill (1982) algorithm (see Microsoft Knowledge Base Article

828795), extensive investigations by McCullough suggested that this is not the case - and, quoting

from the article:

… Excel users will have to content themselves with an unknown RNG [Random Number Generator] of

unknown period that is not known to pass any standard battery tests for randomness.

GeoComputation

Search WWH ::

Custom Search

Home