Information Technology Reference
In-Depth Information
3.4.3
Problems Related to the Regression with LGSS
The size of the systems of equations solved by LGSS ranges from about a thousand
equations to some hundreds of thousands, depending on the number of substances
and reactions of the MP system under examination and on its time interval (a smaller
time interval requires a longer time series and so a larger system of equations) 3 .The
size of regressor dictionary depends on the complexity of the phenomenon under in-
vestigation, but usually it comprises no more than one hundred regressors. The total
computation time for a regression depends on the size of the regressor dictionary and
on the number of equations to be solved. However, the computation usually ends in
few minutes (less than five minutes on average, using a common laptop with a single
dual core CPU and 4Gbyte of RAM memory), but it can increase to hours when the
system is very big (i.e. a system with many hundreds of thousands of equations, and
a regression dictionary of hundreds of regressors).
As explained in Sect. 7.7.1, the correctness of a multiple regression model is
based on some assumptions about the independent variables, and about the prob-
ability distribution of the errors associated to observations. When one or more of
these assumptions are not completely satisfied, some mistakes may occur in the def-
inition of the regression model. There are three main problems that we need to be
aware of in the context of multiple regression: (i) the problem of heteroscedasticity ,
(ii) the problem of residual autocorrelation , and, the most common in LGSS, (iii)
the problem of multicollinearity [76, 216].
The problem of heteroscedasticity occurs when the variance of the residuals of
the regression is not constant and is (directly or indirectly) proportional to the value
of one or more independent variables of the model. This is in contrast with the as-
sumption of uniform error variance, carried out at the beginning of Sect. 7.7.1. When
hereroscedasticity is present, our regression coefficient estimators are not efficient.
This violation of the regression assumptions may sometimes be corrected by the use
of a transformation for the dependent variable Y or by substituting the ordinary least
squares estimation method with the method of weighted least squares .
The problem of residual autocorrelation arises when the error
depends on the
observation points. It is also called residual autocorrelation, because it occurs when
the time series of the error values is highly correlated with the values of the series
at certain previous steps [229].
As in the case of heteroscedasticity, in this case the ordinary least squares may
fail, therefore it can be useful to adopt another procedure called generalized least
squares .
The most common problem related to regression in LGSS is the problem of mul-
ticollinearity . In multiple regression, we hope to have a strong correlation between
each independent variable and the dependent variable Y , but we do not want to have
independent variables correlated among them. In case of perfect collinearity, the re-
gression algorithm breaks down completely. Since in LGSS regulators usually are
ε
3
An implementation of LGSS, as a set of MATLAB functions, has been developed by Luca
Marchetti in 2012 [108].
 
Search WWH ::




Custom Search