Algorithms and Biorhythms - Infobiotics: Information in Biotic Systems

Information Technology Reference

In-Depth Information

3.4.3

Problems Related to the Regression with LGSS

The size of the systems of equations solved by LGSS ranges from about a thousand

equations to some hundreds of thousands, depending on the number of substances

and reactions of the MP system under examination and on its time interval (a smaller

time interval requires a longer time series and so a larger system of equations) 3 .The

size of regressor dictionary depends on the complexity of the phenomenon under in-

vestigation, but usually it comprises no more than one hundred regressors. The total

computation time for a regression depends on the size of the regressor dictionary and

on the number of equations to be solved. However, the computation usually ends in

few minutes (less than five minutes on average, using a common laptop with a single

dual core CPU and 4Gbyte of RAM memory), but it can increase to hours when the

system is very big (i.e. a system with many hundreds of thousands of equations, and

a regression dictionary of hundreds of regressors).

As explained in Sect. 7.7.1, the correctness of a multiple regression model is

based on some assumptions about the independent variables, and about the prob-

ability distribution of the errors associated to observations. When one or more of

these assumptions are not completely satisfied, some mistakes may occur in the def-

inition of the regression model. There are three main problems that we need to be

aware of in the context of multiple regression: (i) the problem of heteroscedasticity ,

(ii) the problem of residual autocorrelation , and, the most common in LGSS, (iii)

the problem of multicollinearity [76, 216].

The problem of heteroscedasticity occurs when the variance of the residuals of

the regression is not constant and is (directly or indirectly) proportional to the value

of one or more independent variables of the model. This is in contrast with the as-

sumption of uniform error variance, carried out at the beginning of Sect. 7.7.1. When

hereroscedasticity is present, our regression coefficient estimators are not efficient.

This violation of the regression assumptions may sometimes be corrected by the use

of a transformation for the dependent variable Y or by substituting the ordinary least

squares estimation method with the method of weighted least squares .

The problem of residual autocorrelation arises when the error

depends on the

observation points. It is also called residual autocorrelation, because it occurs when

the time series of the error values is highly correlated with the values of the series

at certain previous steps [229].

As in the case of heteroscedasticity, in this case the ordinary least squares may

fail, therefore it can be useful to adopt another procedure called generalized least

squares .

The most common problem related to regression in LGSS is the problem of mul-

ticollinearity . In multiple regression, we hope to have a strong correlation between

each independent variable and the dependent variable Y , but we do not want to have

independent variables correlated among them. In case of perfect collinearity, the re-

gression algorithm breaks down completely. Since in LGSS regulators usually are

ε

3

An implementation of LGSS, as a set of MATLAB functions, has been developed by Luca

Marchetti in 2012 [108].

Infobiotics: Information in Biotic Systems

Search WWH ::

Custom Search

Home