Geology Reference
In-Depth Information
1.3 Why Do You Read This Topic?
Despite an abundance of studies on prediction and modelling of different hydro-
logical processes in the hydrological cycle in the last few decades using nonlinear
techniques like ANN, ANFIS and SVMs, there are still many questions that need to
be answered. For example, to what extent do the inputs determine the output by a
smooth model? Given an input vector x, how accurately can the output y be pre-
dicted? How many data points are required to make a prediction with the best
possible accuracy? Which inputs are relevant in making the prediction and which
are irrelevant? These questions have not been fully addressed adequately by the
hydrological community [ 39 ]. The hydrological community acknowledged that
issues like evaluation of available data, assessment of data adequacy and optimum
decision on input selection are main challenges and potentially complicated ques-
tions in data based modelling. Although the performance of a model generally
improves with addition of more information during model calibration, plateaus exist
wherein new information adds little to a model
s performance [ 77 , 90 ]. In fact,
systems accuracy can be reduced with increasing information during validation
[ 90 ], usually because the additional variables produce models with over
'
fitting
problems [ 98 ]. An over
tted model is very speci
c to the training set and performs
poorly on the test set. Over
tting is known to be a problem with multi-variate
statistical methods when the data set contains too many predictor variables [ 98 ],
which lead to excellent results on the training data but very poor results on the
unseen test data. Therefore, an important question for modellers is which inputs are
relevant in making the prediction and which are irrelevant.
However, due to the advancement of modern computing technology and a new
algorithm from the computing science community called the Gamma Test [ 3 , 56 ], it
is possible that we could make signi
cant progress in tackling these problems.
A formal proof for the Gamma Test (GT) can be
find in Evans and Jones [ 33 ]. It is
accomplished by the estimation of the variance of the noise var(r) computed from
the raw data using efficient, scalable algorithms. This novel technique, the Gamma
Test, enables us quickly to evaluate and estimate the best mean squared error that
can be achieved by a smooth model on unseen data for a given selection of inputs,
prior to model construction. This technique can be used to
find the best embedding
dimensions and time lags for time series analysis. This information would help us
determine the best input combinations to achieve a particular target output. The
Gamma Test can avoid overtraining, which is considered as one of the serious
weaknesses associated with almost all nonlinear modelling techniques including
ANN. The Gamma Test is designed to solve this problem ef
ciently by giving an
estimate of how closely any smooth model could
t the unseen data. Thus we can
avoid the guesswork associated with the nonlinear curve
fitting techniques. This
book makes use the capabilities of these concepts in input selection and redundancy
assessment when we have large number of input series for modelling.
 
Search WWH ::




Custom Search