Information Technology Reference
In-Depth Information
FIGURE 7.4 Rugplot of 50 Bootstrap Medians Derived from a Sample of
Sixth Grader's Heights.
Apparently, 12 is a large enough number for a sample mean to be nor-
mally distributed when the variables come from a uniform distribution.
But if you take a smaller sample of observations from a U [0,1] popula-
tion, the distribution of its mean would look less like a bell-shaped curve.
A loose rule of thumb is that the mean of a sample of 8 to 25 observa-
tions will have a distribution that is close enough to the normal for the
standard error to be meaningful. The more nonsymmetric the original dis-
tribution, the larger the sample size required. At least 25 observations are
needed for a binomial distribution with p = 0.1.
Even the mean of observations taken from a mixture of distributions
(males and females, tall Zulu and short Bantu)—visualize a distribution
curve resembling a camel with multiple humps—will have a normal distri-
bution if the sample size is large enough. Of course, this mean (or even
the median) conceals the fact that the sample was taken from a mixture of
distributions.
If the underlying distribution is not symmetric, the use of the ± SE
notation can be deceptive because it suggests a nonexistent symmetry. For
samples from nonsymmetric distributions of size 6 or less, tabulate the
minimum, the median, and the maximum. For samples of size 7 and up,
consider using a box and whiskers plot as in Figure 7.3. For samples of
size 16 and up, the bootstrap (described in Chapters 4 and 5) may
provide the answer you need.
As in Chapters 4 and 5, we would treat the original sample as a stand-in
for the population and resample from it repeatedly, 1000 times or so, with
replacement, computing the sample statistic each time to obtain a distribu-
tion similar to that depicted in Figure 7.4. To provide an interpretation
compatible with that given the standard error when used with a sample
from a normally distributed population, we would want to report the
values of the 16th and 84th percentiles of the bootstrap distribution along
with the sample statistic.
When the estimator is other than the mean, we cannot count on the
Central Limit Theorem to ensure a symmetric sampling distribution. We
recommend you use the bootstrap whenever you report an estimate of a
ratio or dispersion.
If you possess some prior knowledge of the shape of the population dis-
tribution, you should take advantage of that knowledge by using a para-
metric bootstrap (see Chapter 4). The parametric bootstrap is particularly
Search WWH ::




Custom Search