Information Technology Reference
In-Depth Information
Nevada. The calibration climatic data were 62 years of observed precipita-
tion and temperature (1928-1989) at Giant Forest/Grant Grove. The
model was validated by comparing the predictions with the 1873-1927
segments of three climate stations 90 km to the west in the San Joaquin
Valley. The climatic records of these stations were highly correlated with
those at Giant Forest/Grant Grove. Significant correlation of these long-
term station records with the 1873-1927 part of the reconstruction was
accepted as evidence of validation.
Independent verification can help discriminate among several models
that appear to provide equally good fits to the data. Independent verifica-
tion can be used in conjunction with either of the two other validation
methods. For example, an automobile manufacturer was trying to forecast
parts sales. After correcting for seasonal effects and long-term growth
within each region, ARIMA techniques were used. 1 A series of best-fitting
ARIMA models was derived, one model for each of the nine sales regions
into which the sales territory had been divided. The nine models were
quite different in nature. As the regional seasonal effects and long-term
growth trends had been removed, a single ARIMA model applicable to all
regions, albeit with differing coefficients, was more plausible. Accordingly,
the ARIMA model that gave the best overall fit to all regions was utilized
for prediction purposes.
Independent verification also can be obtained through the use of surro-
gate or proxy variables. For example, we may want to investigate past cli-
mates and test a model of the evolution of a regional or worldwide climate
over time. We cannot go back directly to a period before direct measure-
ments on temperature and rainfall were made, but we can observe the
width of growth rings in long-lived trees or measure the amount of
carbon dioxide in ice cores.
Sample Splitting
Splitting the sample into two parts—one for estimating the model parame-
ters, the other for verification—is particularly appropriate for validating
time series models where the emphasis is on prediction or reconstruction.
If the observations form a time series, the more recent observations
should be reserved for validation purposes. Otherwise, the data used for
validation should be drawn at random from the entire sample.
Unfortunately, when we split the sample and use only a portion of it,
the resulting estimates will be less precise.
Browne [1975] suggests we pool rather than split the sample if:
1 For examples and discussion of AutoRegressive Integrated Moving Average processes, see
Brockwell and Davis [1987].
Search WWH ::




Custom Search