Information Technology Reference
In-Depth Information
Fig. 7.6 Plot representing the three deviations introduced in Table 7.6 for a linear regression
model
which are given by the Least Square approximation:
c 0
c 1
...
c k
Y
[
1
]
= M T
M 1
Y
]
...
Y
[
2
M T
×
×
×
(7.19)
[
n
]
where M is the following matrix:
.
1 X 1 [
1
]
X 2 [
1
] ...
X k [
1
]
1 X 1 [
2
]
X 2 [
2
] ...
X k [
2
]
M
=
... ...
... ... ...
1 X 1 [
n
]
X 2 [
n
] ...
X k [
n
]
The correctness of the estimated values, with respect to the real ones, depends on
the amount of the unexplained deviation (i.e. the regression error) as indicated in
Table 7.6 and displayed in Fig. 7.6.
The Mean Square Error is an unbiased estimator of the variance of the errors.
It is given by the ratio of the sum of squared errors (SSE) with the number of the
degrees of freedom associated to the regression model, that is, the number of data
points minus the number of the regression coefficients used in the model ( n is the
number of data points used for the regression):
j
2
(
y
[
j
]
y
[
j
])
) =
SSE
=
1
MSE
=
.
(7.20)
n
(
k
+
1
n
(
k
+
1
)
We remark here that when we use n data points, a regression model which uses n
1
independent variables always reaches a perfect fit. However, when we do this we are
overfitting our data, leaving no degree of freedom for errors. In this case, we will fit
Search WWH ::




Custom Search