Information Technology Reference
In-Depth Information
value obtained from the model. The smaller this sum of squares, the
better the fit.
If the observations are independent, then
Â
Â
Â
*
)
*
).
(
yy
-
2
=
(
yy
-
)
2
-
(
yy
-
2
i
i
i
i
The first sum on the right-hand side of the equation is the total sum
of squares (SST). Most statistics software uses as a measure of fit
R
2
=
1
-
SSE/SST.
The closer the value of
R
2
is to 1, the better.
The automated entry of predictors into the regression equation using
R
2
runs the risk of overfitting, because
R
2
is guaranteed to increase with
each predictor entering the model. To compensate, one may use the
adjusted
R
2
[
(
(
)
)
]
(
)
-
(
)
1
--
ni
1
R
2
n p
-
where
n
is the number of observations used in fitting the model,
p
is the
number of estimated regression coefficients, and
i
is an indicator variable
that is 1 if the model includes an intercept and is 0 otherwise.
The adjusted
R
2
has two major drawbacks according to Rencher and
Pun [1980]:
1. The adjustment algorithm assumes the predictors are independent;
more often the predictors are correlated.
2. If the pool of potential predictors is large, multiple tests are per-
formed, and
R
2
is inflated in consequence; the standard algorithm
for adjusted
R
2
does not correct for this inflation.
A preferable method of guarding against overfitting the regression
model, proposed by Wilks [1995], is to use validation as a guide for stop-
ping the entry of additional predictors. Overfitting is judged to begin
when entry of an additional predictor fails to reduce the prediction error
in the validation sample.
Mielke et al. [1997] propose the following measure of predictive
accuracy for use with either a mean-square-deviation or a mean-absolute-
deviation loss function:
1
n
1
n
n
Â
Â
Â
*
*
.
M
=-
1
dm
,
where
d
= | -|
yy
and
m
= | -|
yy
d
i
i
d
i
j
n
2
n
i
=
1
i
=
1
j
=
1
Uncertainty in Predictions
Whatever measure is used, the degree of uncertainty in your predictions
should be reported. Error bars are commonly used for this purpose.