Validation - Common Errors in Statistics

Information Technology Reference

In-Depth Information

The prediction error is larger when the predictor data are far from their

calibration-period means, and vice versa. For simple linear regression, the

standard error of the estimate s e and standard error of prediction s y* are

related as follows:

(

) +-

n

+

1

Â

n

(

)

2

(

)

2

s

* =

s

xx

-

y

e

p

i

n

i

=

1

where n is the number of observations and x i is the ith value of the predic-

tor in the calibration sample, and x p is the value of the predictor used for

the prediction.

The relation between s y* and s e is easily generalized to the multivariate

case. In matrix terms, if Y = AX + E and y * = AX p , then s y* = s e {1 +

x T p ( X T X ) -1 x p }.

This equation is only applicable if the vector of predictors lies inside the

multivariate cluster of observations on which the model was based. An

important question is how “different” can the predictor data be from

the values observed in the calibration period before the predictions are

considered invalid.

LONG-TERM STABILITY

Time is a hidden dimension in most economic models. Many an airline

has discovered to its detriment that what was an optimal price today leads

to half-filled planes and markedly reduced profits tomorrow. A careful

reading of the newspapers lets them know a competitor has slashed prices,

but more advanced algorithms are needed to detect a slow shifting in

tastes of prospective passengers. The public, tired of being treated no

better than hogs, 2 turns to trains, personal automobiles, and

teleconferencing.

An army base, used to a slow seasonal turnover in recruits, suddenly

finds that all infirmary beds are occupied and the morning lineup for sick

call stretches the length of a barracks.

To avoid a pound of cure:

•

Treat every model as tentative, best described, as any lawyer will

advise you, as subject to change without notice.

•

Monitor continuously.

2 Or somewhat worse, because hogs generally have a higher percentage of fresh air to

breathe.

Common Errors in Statistics

Search WWH ::

Custom Search

Home