Information Technology Reference
In-Depth Information
TABLE 3.1 Ingredients in a Sample Size Calculation
Type I error ( a )
Probability of falsely rejecting the hypothesis when it is
true.
Type II error (1 - b [A])
Probability of falsely accepting the hypothesis when an
alternative hypothesis A is true. Depends on the
alternative A.
Power = b [A]
Probability of correctly rejecting the hypothesis when an
alternative hypothesis A is true. Depends on the
alternative A.
Distribution functions
F [(x - m ) s ], e.g., normal distribution.
Location parameters
For both hypothesis and alternative hypothesis: m 1 , m 2 .
Scale parameters
For both hypothesis and alternative hypothesis: s 1 , s 2 .
Sample sizes
May be different for different groups in an experiment
with more than one group
Permit a greater number of Type I or Type II errors (and hold all
other parameters fixed) and we can decrease the required number
of observations.
Explicit formula for power and significance level are available when the
underlying observations are binomial, the results of a counting or Poisson
process, or normally distributed. Several off-the-shelf computer programs
including nQuery Advisor TM , Pass 2000 TM , and StatXact TM are available to
do the calculations for us.
To use these programs, we need to have some idea of the location
(mean) and scale parameter (variance) of the distribution both when the
primary hypothesis is true and when an alternative hypothesis is true.
Since there may well be an infinity of alternatives in which we are inter-
ested, power calculations should be based on the worst-case or boundary
value. For example, if we are testing a binomial hypothesis p = 1/2
against the alternatives p £ 2/3, we would assume that p = 2/3.
If the data do not come from one of the preceding distributions, then
we might use a bootstrap to estimate the power and significance level.
In preliminary trials of a new device, the following test results were
observed: 7.0 in 11 out of 12 cases and 3.3 in 1 out of 12 cases. Industry
guidelines specified that any population with a mean test result greater
than 5 would be acceptable. A worst-case or boundary-value scenario
would include one in which the test result was 7.0 3/7th of the time, 3.3
3/7th of the time, and 4.1 1/7th of the time.
The statistical procedure required us to reject if the sample mean of the
test results were less than 6. To determine the probability of this event for
various sample sizes, we took repeated samples with replacement from the
two sets of test results. Some bootstrap samples consisted of all 7's,
whereas some, taken from the worst-case distribution, consisted only of
Search WWH ::




Custom Search