Information Technology Reference
In-Depth Information
These desirable properties, indeed the ability to obtain coefficient values
that are of use in practical applications, will not be present if the wrong
model has been adopted. They will not be present if successive observa-
tions are dependent. The values of the coefficients produced by the soft-
ware will not be of use if the associated losses depend on some function of
the observations other than the sum of the squares of the differences
between what is observed and what is predicted. In many practical prob-
lems, one is more concerned with minimizing the sum of the absolute
values of the differences or with minimizing the maximum prediction error.
Finally, if the error terms come from a distribution that is far from Gauss-
ian, a distribution that is truncated, flattened, or asymmetric, the p values
and precision estimates produced by the software may be far from correct.
Alternatively, we may use permutation methods to test for the signifi-
cance of the resulting coefficients. Provided that the {e i } are independent
and identically distributed (Gaussian or not), the resulting p values will be
exact. They will be exact regardless of which goodness-of-fit criterion is
employed.
Suppose that our hypothesis is that y i = a + bx i + e i for all i and b = b 0 .
First, we substitute y ¢ I = y i - b 0 x i in place of the original observations y i .
Our translated hypothesis is y ¢ i = a + b ¢ x i + e i for all i and b ¢=0 or, equiva-
lently, r = 0, where r is the correlation between the variables Y ¢ and X .
Our test for correlation is based on the permutation distribution of the
sum of the cross-products y ¢ i x i (Pitman, 1938). Alternative tests based on
permutations include those of Cade and Richards [1996], and tests based
on MRPP LAD regression include those of Mielke and Berry [1997].
For large samples, these tests are every bit as sensitive as the least-squares
test described in the previous paragraph even when all the conditions for
applying that test are satisfied (Mielke and Berry, 2001, Section 5.4).
If the errors are dependent and normally distributed and the covariances
are the same for every pair of errors, then we may also apply any of the
permutation methods described above. If the errors are dependent and
normally distributed, but we are reluctant to make such a strong assump-
tion about the covariances, then our analysis may call for dynamic regres-
sion models (Pankratz, 1991). 1
FURTHER CONSIDERATIONS
Bad Data
The presence of bad data can completely distort regression calculations.
When least-squares methods are employed, a single outlier can influence
In the SAS manual, these are called ARIMAX techniques and are incorporated in Proc
ARIMA.
1
Search WWH ::




Custom Search