Univariate Regression - Common Errors in Statistics

Information Technology Reference

In-Depth Information

These desirable properties, indeed the ability to obtain coefficient values

that are of use in practical applications, will not be present if the wrong

model has been adopted. They will not be present if successive observa-

tions are dependent. The values of the coefficients produced by the soft-

ware will not be of use if the associated losses depend on some function of

the observations other than the sum of the squares of the differences

between what is observed and what is predicted. In many practical prob-

lems, one is more concerned with minimizing the sum of the absolute

values of the differences or with minimizing the maximum prediction error.

Finally, if the error terms come from a distribution that is far from Gauss-

ian, a distribution that is truncated, flattened, or asymmetric, the p values

and precision estimates produced by the software may be far from correct.

Alternatively, we may use permutation methods to test for the signifi-

cance of the resulting coefficients. Provided that the {e i } are independent

and identically distributed (Gaussian or not), the resulting p values will be

exact. They will be exact regardless of which goodness-of-fit criterion is

employed.

Suppose that our hypothesis is that y i = a + bx i + e i for all i and b = b 0 .

First, we substitute y ¢ I = y i - b 0 x i in place of the original observations y i .

Our translated hypothesis is y ¢ i = a + b ¢ x i + e i for all i and b ¢=0 or, equiva-

lently, r = 0, where r is the correlation between the variables Y ¢ and X .

Our test for correlation is based on the permutation distribution of the

sum of the cross-products y ¢ i x i (Pitman, 1938). Alternative tests based on

permutations include those of Cade and Richards [1996], and tests based

on MRPP LAD regression include those of Mielke and Berry [1997].

For large samples, these tests are every bit as sensitive as the least-squares

test described in the previous paragraph even when all the conditions for

applying that test are satisfied (Mielke and Berry, 2001, Section 5.4).

If the errors are dependent and normally distributed and the covariances

are the same for every pair of errors, then we may also apply any of the

permutation methods described above. If the errors are dependent and

normally distributed, but we are reluctant to make such a strong assump-

tion about the covariances, then our analysis may call for dynamic regres-

sion models (Pankratz, 1991). 1

FURTHER CONSIDERATIONS

Bad Data

The presence of bad data can completely distort regression calculations.

When least-squares methods are employed, a single outlier can influence

In the SAS manual, these are called ARIMAX techniques and are incorporated in Proc

ARIMA.

1

Search WWH ::

Custom Search

Home