Graphics Reference
In-Depth Information
Introduction
7.1
Regression modeling oten requires many subjective decisions, such as choice of
transformation for each variable and the type and number of terms to include in
the model. he transformations may be as simple as powers and cross-products or
as sophisticated as indicator functions and splines. Sometimes, the transformations
are chosen to satisfy certain subjective criteria such as approximate normality of the
marginal distributions of the predictor variables. Further, model building is almost
always an iterative process, with the fit of the model evaluated each time terms are
added or deleted.
In statistical applications, a regression model is generally considered acceptable if
it satisfies two criteria. he first is that the distribution of the residuals agrees with
that specifiedbythemodel.Inthecaseofleast-squares regression,thisusually means
normalityandvariance homogeneityoftheresiduals.hewholesubjectofregression
diagnostics is concerned with this problem (Belsley et al., ). his criterion can
be hard to achieve, however, in complex datasets without the fitted model becoming
unwieldy. hesecond criterion, which is preferred almost exclusively in the machine
learning literature, is that the model has low mean prediction squared error or, more
generally, deviance.
If model selection is completely sotware based, the prediction deviance of an al-
gorithm can be estimated by V-fold cross-validation as follows:
. Randomly divide the dataset into V roughly equal parts.
. Leaving out one part in turn, apply the algorithm to the observations in the re-
maining V
parts to obtain a model.
. Estimate the mean prediction deviance of each model by applying the let-out
data to it.
. Average the V estimates to get a cross-validation estimate for the model con-
structed from all the data.
he value of V may be as small as for very large datasets and as large as the sample
size for small datasets. But cross-validation is impractical if the modelis selected not
by a computer algorithm but by a person making subjective decisions at each stage.
In this case, penalty-based methods such as AIC (Akaike, ) are oten employed.
hese methods select the model that minimizes a sum of the residual deviance plus
a penalty term times a measure of model complexity. Although the rationale makes
sense, there is no, and probably never will be, consensus on the right value of the
penalty term for all datasets.
A separate, butno lessimportant, problemishowtobuilda regression modelthat
can be interpreted correctly and unambiguously. In practice, the majority of con-
sumers of regression models oten are more interested in model interpretation than
in optimal prediction accuracy. hey want to know which predictor variables affect
the response and how they do it. Sometimes, they also want a rank ordering of the
predictors according to the strength of their effects, although this question is mean-
inglesswithout amorepreciseformulation. Nonetheless, itisasadfactthat the mod-
Search WWH ::




Custom Search