Biology Reference
In-Depth Information
word ''confidence'' is an important part of the statistical vocabulary).
There is a class of statistical methods called power analysis that allows an
optimal sample size for achieving a desired confidence to be estimated
prior to the experiment. While it is intuitively clear that a larger
sample size would result in more powerful tests, power analysis is a
convenient tool whenever data collection incurs expenses. In such a case,
the problem is to find the minimal sample size to achieve the desired
power of the test.
The mathematical formulation of the problem uses some specific
language we outline next. The traditional way to describe a statistical
problem begins like this: Let x 1 ;
x N be N independent observations
of a normally distributed random variable
x 2 ; ... ;
x
with unknown parameters
m
. This means we have collected some data by measuring a
random quantity known to have a normal distribution, and the values
from the measurements have been denoted by x 1 ;
and
s
x N . These
values are the data points forming our sample for the random variable
x 2 ; ... ;
x
.
In terms of our corn yield example, assume that the random variable
x
represents the population yield from variety A that we are attempting to
estimate by sampling repeatedly. In the first trial, the sample size
was N
¼
2, while in the second trial we considered a larger sample size
of N
¼
10. The data points x 1
¼
2
:
1
;
x 2
¼
2
:
6 define the sample for
x
from the first trial, while x 1
¼
2
:
2
;
x 2
¼
2
:
8
; ... ;
x 10
¼
2
:
0 define the
sample from the second trial.
As already mentioned, the mean of a random variable could be thought
of as the average after many trials. Thus, it is n atural to expect that the
average value of the data points, denoted by x
;
P
N
x i
N X
N
x 1 þ
x 2 þ ... þ
x N
1
i
¼
1
x
¼
¼
¼
x i ;
(4-3)
N
N
i
¼
1
would be a good estimate of the mean value parameter
of the normal
distribution. 2 This is why the average value calculated in Eq. (4-3) is
sometimes called the empirical mean or sample mean of the random
variable
m
x
. It can be proven that this is the best estimate (in terms of
statistical criteria), also called the maximum likelihood estimate. In these
terms, the test average you earn in a class is a maximum likelihood
estimate of your grade. Similarly, it can be shown that a maximum
likelihood estimate of the variance
2 of a normal distribution is given by
s
the formula
1 X
2
2
2
N
Þ¼ ð
x 1
x
Þ
þð
x 2
x
Þ
þ ... þð
x N
x
Þ
1
2
s 2
s 2
¼
ð
N
¼
1 ð
x i
x
Þ
:
N
1
N
i
¼
(4-4)
2. As is customary in mathematics, we have used P i ¼ 1 x i to denote the sum
x 1 þ
x 2 þ ... þ
x N :
Search WWH ::




Custom Search