Geology Reference
In-Depth Information
Fig. 2.2 Partition procedure adopted conventionally in learning to prevent overfitting
The traditional data partitioning method adopted to prevent over
tting in most of
the literature is shown in Fig. 2.2
2.3 Input Variable (Data) Selection
One of the serious problems encountered in data-based modeling in hydrology
using either traditional or intelligent approaches is the choice of independent
variables or data series from the available data pool for inclusion in the predictive
model. Although over
tting issues in the previous section are addressed in the
context of neural models, studies have shown that over
tting is not con
ned just to
neural models with hidden units. Over
tting can occur even in generalized linear
models with no hidden nodes or layers because of improper selection of inputs.
is another weakness in data-based modeling. It is a statistical
situation from the presence of input variables or data series in the input architecture,
which are highly correlated with each other. Although this is the situation in most of
the studies in water resources using data-based models, very little attention or no
attention is being given to the selection process of better input model structure [ 49 ].
The lack of methodological approach in selecting the signi
Multicollinearity
cant inputs may lead to
the modeling issues listed below:
1. Increase in input dimensionality: when we use all available inputs indiscrimi-
nately, this would cause computational complexity and memory insuf
ciency
2. Presence of more local minima in the error surface due to inclusion of irrelevant
data points
3. Early convergence and divergence due to the presence of irrelevant data: this
will lead to poor model accuracy.
Maier and Dandy [ 49 ] have highlighted the fact that issues relating to the
optimal division of the available data, data pre-processing, and the choice of
appropriate model
inputs are seldom considered in data-based modeling and
 
Search WWH ::




Custom Search