Introduction - Hydrological Data Driven Modelling

Geology Reference

In-Depth Information

1.3 Why Do You Read This Topic?

Despite an abundance of studies on prediction and modelling of different hydro-

logical processes in the hydrological cycle in the last few decades using nonlinear

techniques like ANN, ANFIS and SVMs, there are still many questions that need to

be answered. For example, to what extent do the inputs determine the output by a

smooth model? Given an input vector x, how accurately can the output y be pre-

dicted? How many data points are required to make a prediction with the best

possible accuracy? Which inputs are relevant in making the prediction and which

are irrelevant? These questions have not been fully addressed adequately by the

hydrological community [ 39 ]. The hydrological community acknowledged that

issues like evaluation of available data, assessment of data adequacy and optimum

decision on input selection are main challenges and potentially complicated ques-

tions in data based modelling. Although the performance of a model generally

improves with addition of more information during model calibration, plateaus exist

wherein new information adds little to a model

s performance [ 77 , 90 ]. In fact,

systems accuracy can be reduced with increasing information during validation

[ 90 ], usually because the additional variables produce models with over

'

fitting

problems [ 98 ]. An over

tted model is very speci

c to the training set and performs

poorly on the test set. Over

tting is known to be a problem with multi-variate

statistical methods when the data set contains too many predictor variables [ 98 ],

which lead to excellent results on the training data but very poor results on the

unseen test data. Therefore, an important question for modellers is which inputs are

relevant in making the prediction and which are irrelevant.

However, due to the advancement of modern computing technology and a new

algorithm from the computing science community called the Gamma Test [ 3 , 56 ], it

is possible that we could make signi

cant progress in tackling these problems.

A formal proof for the Gamma Test (GT) can be

find in Evans and Jones [ 33 ]. It is

accomplished by the estimation of the variance of the noise var(r) computed from

the raw data using efficient, scalable algorithms. This novel technique, the Gamma

Test, enables us quickly to evaluate and estimate the best mean squared error that

can be achieved by a smooth model on unseen data for a given selection of inputs,

prior to model construction. This technique can be used to

find the best embedding

dimensions and time lags for time series analysis. This information would help us

determine the best input combinations to achieve a particular target output. The

Gamma Test can avoid overtraining, which is considered as one of the serious

weaknesses associated with almost all nonlinear modelling techniques including

ANN. The Gamma Test is designed to solve this problem ef

ciently by giving an

estimate of how closely any smooth model could

t the unseen data. Thus we can

avoid the guesswork associated with the nonlinear curve

fitting techniques. This

book makes use the capabilities of these concepts in input selection and redundancy

assessment when we have large number of input series for modelling.

Hydrological Data Driven Modelling

Search WWH ::

Custom Search

Home