Estimation - Common Errors in Statistics

Information Technology Reference

In-Depth Information

samples under size 30, the interval is still suspect. The idea behind these

intervals comes from the observation that percentile bootstrap intervals

are most accurate when the estimate is symmetrically distributed about

the true value of the parameter and the tails of the estimate's distribution

drop off rapidly to zero. The symmetric, bell-shaped normal distribution

depicted in Figure 7.1 represents this ideal.

Suppose qis the parameter we are trying to estimate, is the estimate,

and we are able to come up with a monotone increasing transformation

m such that m (q) is normally distributed about m ( ). We could use this

normal distribution to obtain an unbiased confidence interval, and then

apply a back-transformation to obtain an almost-unbiased confidence

interval. 3

Even with these modifications, we do not recommend the use of the

nonparametric bootstrap with samples of fewer than 100 observations.

Simulation studies suggest that with small sample sizes, the coverage is far

from exact and the endpoints of the intervals vary widely from one set of

bootstrap samples to the next. For example, Tu and Zhang [1992] report

that with samples of size 50 taken from a normal distribution, the actual

coverage of an interval estimate rated at 90% using the BC a bootstrap is

88%. When the samples are taken from a mixture of two normal distribu-

tions (a not uncommon situation with real-life data sets) the actual cover-

age is 86%. With samples of only 20 in number, the actual coverage is 80%.

More serious when trying to apply the bootstrap is that the endpoints

of the resulting interval estimates may vary widely from one set of

bootstrap samples to the next. For example, when Tu and Zhang drew

samples of size 50 from a mixture of normal distributions, the average of

the left limit of 1000 bootstrap samples taken from each of 1000 simu-

lated data sets was 0.72 with a standard deviation of 0.16, and the average

and standard deviation of the right limit were 1.37 and 0.30, respectively.

q

Parametric Bootstrap

Even when we know the form of the population distribution, the use of

the parametric bootstrap to obtain interval estimates may prove advantage-

ous either because the parametric bootstrap provides more accurate

answers than textbook formulas or because no textbook formulas exist.

Suppose we know that the observations come from a normal distribu-

tion and want an interval estimate for the standard deviation. We would

draw repeated bootstrap samples from a normal distribution, the mean of

which is the sample mean and the variance of which is the sample variance.

3 Stata TM provides for bias-corrected intervals via its bstrap command. R- and S-Plus both

include BC a functions. A SAS macro is available at http://www.asu.edu/it/fyi/research/

helpdocs/statistics/SAS/tips/jackboot.html.

Search WWH ::

Custom Search

Home