Information Technology Reference
In-Depth Information
observations. To simplify comparisons, most statisticians focus on the
asymptotic relative efficiency (ARE), defined as the limit with increasing
sample size of the ratio of the number of observations required for each
of two consistent statistical procedures to achieve the same degree of
accuracy.
Robust
Estimators that are perfectly satisfactory for use with symmetric normally
distributed populations may not be as desirable when the data come from
nonsymmetric or heavy-tailed populations, or when there is a substantial
risk of contamination with extreme values.
When estimating measures of central location, one way to create a more
robust estimator is to trim the sample of its minimum and maximum
values (the procedure used when judging ice-skating or gymnastics). As
information is thrown away, trimmed estimators are less efficient.
In many instances, LAD (least absolute deviation) estimators are more
robust than their LS (least square) counterparts. 1 This finding is in line
with our discussion of the F statistic in the preceding chapter.
Many semiparametric estimators are not only robust but provide for
high ARE with respect to their parametric counterparts.
As an example of a semi-parametric estimator, suppose the { X i } are
independent identically distributed (i.i.d.) observations with distribution
Pr{ X i £ x } = F [ y -D] and we want to estimate the location parameter D
without having to specify the form of the distribution F . If F is normal
and the loss function is proportional to the square of the estimation error,
then the arithmetic mean is optimal for estimating D. Suppose, on the
other hand, that F is symmetric but more likely to include very large or
very small values than a normal distribution. Whether the loss function is
proportional to the absolute value or the square of the estimation error,
the median, a semiparametric estimator, is to be preferred. The median
has an ARE relative to the mean that ranges from 0.64 (if the observa-
tions really do come from a normal distribution) to values well in excess
of 1 for distributions with higher proportions of very large and very small
values (Lehmann, 1998, p. 242). Still, if the unknown distribution is
“almost” normal, the mean would be far preferable.
If we are uncertain whether or not F is symmetric, then our best choice
is the Hodges-Lehmann estimator defined as the median of the pairwise
averages
ˆ
(
)
D=
median ij
XX 2
+
.
£
j
i
1
See, for example, Yoo [2001].
Search WWH ::




Custom Search