Biology Reference
In-Depth Information
or the performance of discriminant function or canonical variates analysis. In these appli-
cations, some portion of the data (anywhere from 1 specimen to 50% of the data) is held
aside as a test set, while the model or discriminant function is fitted to the remainder of
the data, which is designated as the training data. The quality of the fit of the model, or
the performance of the discriminant function, is then evaluated on the test data. This
approach yields an estimate of the performance of the model, if it were to be used on new
data. Cross-validation is particularly helpful in detecting overfitting of models.
Monte Carlo Methods
Monte Carlo methods compare the value of an observed statistic to the range of values
expected under a given null hypothesis, assuming a model of the populations involved.
Like analytical statistical methods, Monte Carlo methods require making assumptions
about the nature of the distribution from which populations are drawn. They then fit para-
meters of the distributional models to the observed samples. In contrast, analytic statistical
approaches use algebraic derivations to estimate the values of statistics (and standard
errors in those statistics) based on the nature of the underlying distributions. The distinc-
tion is that Monte Carlo approaches generate random data sets based on the parameters
and distribution of the model; those random data sets are drawn from model distributions
having the same sample size as the original one. The distribution of the statistic of interest
(estimated over many computer-generated Monte Carlo sets) is used to estimate the mean
and standard deviation of that statistic under the null model and the model distribution
used. Monte Carlo methods can be used both for hypothesis testing and for generating
confidence intervals.
Monte Carlo methods use numerical simulations to avoid the need for extensive alge-
braic computations and approximations. It may often be easier to program a Monte Carlo
simulation than to determine analytically the distribution of an intricate statistical func-
tion, particularly when the statistic is not a linear function. Because it is necessary to
assume a model of the distributions of the samples, the Monte Carlo method shares most
of the primary weaknesses of analytic statistics; if the observed distribution departs sub-
stantially from the model, the Monte Carlo sets will not represent the actual system of
interest. One useful feature of the Monte Carlo method is the ability to determine the effect
of different distributional models (the ones typically used are the uniform, normal or
Gaussian, and Poisson) on the range of values estimated by the Monte Carlo sets. The
comparison of observed distributions to those produced by Monte Carlo methods is a
powerful approach to hypothesis testing.
For example, if we wish to determine the significance of the observed difference in the
means of sets
X
and
Y
:
X 5 f 2
;
2
;
3
;
4
;
2
;
5
;
3
;
2
;
6
;
2
;
3
;
4
;
6
;
2
;
1
;
4
;
3
;
7
;
2
;
3
;
4
;
4
;
5
;
8
;
5
;
2
;
1
;
3
;
4
;
4
;
3
g
(8A.33)
Y 5 f
2
;
2
;
3
;
2
;
4
;
2
;
3
;
2
;
8
;
9
;
2
;
9
;
3
;
2
;
3
;
3
;
3
;
9
g
(8A.34)
we will test the null hypothesis that the two sets (
) came from the same underly-
ing distribution, with the observed difference between them being due to a random
X
and
Y
Search WWH ::




Custom Search