Geology Reference
In-Depth Information
3.6.3 K-Fold Cross-Validation
K-fold cross-validation is the most popular sub-sampling technique. The concept of
K-fold cross-validation is not new and it is reported by Breiman et al. [ 10 ]. Based
on their detailed simulation studies on this concept, they concluded that these
methods do not always work. Even now, detailed research is continuing to
nd the
values of K for which K-fold cross-validation works best. Some research has shown
that the success in determining the K value is highly arbitrary and depends on the
experimental settings [ 12 ]. In the case of K-fold cross-validation, all the data is split
into K equal parts and one portion is used as the test data set; the rest is used as the
training data set. Later, another portion is used as the test data in the second
experiment. This practice is iterated K times and the error is estimated in each
scenario. The true error can be estimated from the predictions of the K test data by
averaging the respective errors in each experiment. The pictorial representation of
K-fold cross-validation is given in Fig. 3.8 . If we use a large number of folds for
modeling, the bias of the true error rate estimator will be small, whereas the vari-
ance of the true error rate estimator will be large. The number of experiments, and
therefore the computation time, are less when we use a small number of folds. The
variance of the estimator is small and the bias of the estimator is larger than true
error rate. The common practice for K-fold cross validation is K = 10; which was
adopted in this thesis.
Fig. 3.8 Data splitting in K-fold cross-validation
Search WWH ::




Custom Search