Information Technology Reference
In-Depth Information
K nearest neighbors
Binary decision tree
Random forests
Discriminant PLS
Discriminant sparse PLS
Fig. 9.17 Correct classi cation rate for the classi cation methods on the slope criterion
Common classification methods were initially used on the slope matrix to predict
the alertness state of the participants. Predictive performance of k nearest neighbors
[presented in Hastie et al. ( 2009 )], binary decision tree (Breiman et al. 1984 )
(CART), random forests (Breiman 2001 ), discriminant PLS [by direct extension of
the regression PLS method described in Tenenhaus ( 1998 ) recoding the variable to
explain using dummy variables], and discriminant sparse PLS (L
é
Cao et al. 2008 )
were studied. R packages
were, respectively, used to test these methods. Random forests have been applied
by setting the number of trees at 15,000 and leaving the other settings by default.
Other methods were tuned by applying a tenfolds cross-validation on the training
sample (number of neighbors for k nearest neighbors, complexity of the tree for
CART, number of components for the discriminant PLS, number of components,
and value of the thresholding parameter for discriminant sparse PLS). The PLS
method has been adapted for classi
class,
”“
rpart,
”“
randomForest,
”“
pls,
and
SPLS
cation by recoding the variable to predict
(alertness) using a matrix formed by an indicator of the modality (
normal
or
). To compare the results, these methods were evaluated on the same
samples (learning and test). A
relaxed
fivefold cross-validation was used to calculate a
classi
cation rate. This operation was repeated 100 times to study the stability of
classi
cation methods with respect to the data partitioning. The results are given by
the boxplots in Fig. 9.17 .
It appears that the median correct classi
cation rate (CCR) is very disappointing.
It does not exceed 40 % for most methods. Table 9.1 summarizes the means and
standard deviations obtained using classification methods on the slope criterion.
Large standard deviations re
uence of the data partitioning on the results.
In the case of a binary prediction, these results cannot be satisfactory. It is likely
that the inter-individual variability observed in Fig. 9.15 has affected the perfor-
mance of the classi
ect the in
cation methods. This inter-individual variability is very dif
cult
to include in the classi
cation methods with the available data for this study.
Therefore, the preprocessing has been re
ned to obtain improved classi
cation
Search WWH ::




Custom Search