Geology Reference
In-Depth Information
of the
first. These sets of uncorrelated variables (principal components) can be
ordered by reducing variability and the last few items of these variables can be
removed with minimum loss of real data.
3.6 Traditional Approaches in Data and Model Selection
The primary objective of data selection is to ensure the appropriate type and sources
of data which allow investigators to answer adequately the intended modeling
questions. Traditionally, proper selection of representative data from the available
data pool is a tricky issue in all scienti
c divisions, including hydrology. There is a
wide variety of sampling approaches available in the literature for reducing the
likelihood of drawing a biased sample; including simple random sampling, strati
ed
sampling, cluster sampling, systematic sampling, etc. Some of these approaches are
more suitable for qualitative research and the results might be contradictory for
quantitative research. Validation approaches are useful methods for assessing how
accurately a predictive model will perform and the effectiveness of data splitting for
training. It is very useful, in real life problems, where we have limited samples of
data, to estimate the model input dimension which will provide the lowest error rate
and the most stable results in future. Cross-validation and bootstrapping are the
common approaches of this kind of work, which is considered to be a method
which is better than a residual approach. The major types of CVA are the holdout
method, K-fold cross-validation, and leave-one-out cross-validation.
3.6.1 The Holdout Method
The holdout method is a common method used in arti
cial neural networks (ANN)
using a partial set training method as suggested by Donald F. Specht [ 69 ]. It is
considered to be the simplest kind of cross-validation. The holdout method reserves a
certain amount for testing (testing data) and uses the remainder for training (normally
half to two-thirds of the whole data) as shown in Fig. 3.6 . The advantage of this
method is that it is less computation intensive and preferable to the residual method.
 
Search WWH ::




Custom Search