Geology Reference
In-Depth Information
However, its evaluation can have high variance because the samples are not repre-
sentative. The evaluation largely depends on which data points end up in the training
and test sets.
The Repeated Holdout Method is another modi
ed approach of the above-
mentioned basic concept. In this, an attempt is made to have more reliability in
holdout estimations by repeating the process with different resampling approaches.
This advanced version of approach commonly uses strati
ed sampling to ensure
that each class is represented with approximately equal proportions in both subsets.
The errors on the different iterations of subsets are averaged to yield an overall error
rate. However, this advanced version is not completely free from bias in training
and testing data sets. Another disadvantage is overlapping of different test sets.
3.6.2 Random Sub-sampling
This is another famous CVA. Random sub-sampling is also known as Monte Carlo
cross-validation or repeated evaluation set in literature [ 61 ]. In this approach, the
whole data is randomly split into subsets (as shown in Fig. 3.7 ) in which the size of
the subsets is arbitrarily decided by the user. Some research suggests that random
sub-sampling is asymptotically consistent, resulting in more pessimistic predictions
of the test data compared with conventional full cross-validation and making more
realistic estimations of the predictions of external validation data [ 66 , 82 ].
Fig. 3.7 Data splitting in the random sub-sampling approach
Search WWH ::




Custom Search