Combinations and Chances - Infobiotics: Information in Biotic Systems

Information Technology Reference

In-Depth Information

Fig. 7.6 Plot representing the three deviations introduced in Table 7.6 for a linear regression

model

which are given by the Least Square approximation:

⎛

⎞

⎛

⎞

c 0

c 1

...

c k

Y

[

1

]

⎝

⎠ = M T

M − 1

⎝

⎠

Y

]

...

Y

[

2

M T

×

(7.19)

[

n

]

where M is the following matrix:

⎛

⎝

⎞

⎠ .

1 X 1 [

1

]

X 2 [

1

] ...

X k [

1

]

1 X 1 [

2

]

X 2 [

2

] ...

X k [

2

]

M

=

... ...

... ... ...

1 X 1 [

n

]

X 2 [

n

] ...

X k [

n

]

The correctness of the estimated values, with respect to the real ones, depends on

the amount of the unexplained deviation (i.e. the regression error) as indicated in

Table 7.6 and displayed in Fig. 7.6.

The Mean Square Error is an unbiased estimator of the variance of the errors.

It is given by the ratio of the sum of squared errors (SSE) with the number of the

degrees of freedom associated to the regression model, that is, the number of data

points minus the number of the regression coefficients used in the model ( n is the

number of data points used for the regression):

j

2

(

y

[

j

] −

y

[

j

])

) = ∑

SSE

=

1

MSE

=

.

(7.20)

n

− (

k

+

1

n

− (

k

+

1

)

−

We remark here that when we use n data points, a regression model which uses n

1

independent variables always reaches a perfect fit. However, when we do this we are

overfitting our data, leaving no degree of freedom for errors. In this case, we will fit

Infobiotics: Information in Biotic Systems

Search WWH ::

Custom Search

Home