Biomedical Engineering Reference
In-Depth Information
Fig. 3. Left: MDS view of clusters; right: LDA boundaries in W.Q3, SWS.Q1 space
Measurement of Cluster Separation via Classification
Classification Based on All Duration Quartile Variables. Separation among clusters
was further assessed quantitatively by performing a classification task in which the EM
cluster labels are viewed as the target class attribute, with the variables used for clus-
tering used as predictive attributes. Classification accuracy, the fraction of instances
for which the cluster label is correctly predicted, and the area under the Receiving
Operating Characteristic (ROC) plot [14], remain consistently above 0 . 80 in the cases
k =2 , 3 , 4 for widely used classification techniques including C4.5 (J48) decision tree
learning, naıve Bayes, and multilayer artificial neural networks (ANN). The area under
the ROC plot accounts for prediction errors on a per-class basis, and is a better measure
of classification performance in this context because the class (cluster) sizes are very
dissimilar. Accuracy can produce overly optimistic results in such situations. Mean val-
ues of the area under the ROC plot for selected classifiers appear in Table 2. A 4 -fold
cross-validation protocol was employed to control variance due to data sampling.
Ta b l e 2 . AUC selected classifiers
Ta b l e 3 . Bout duration cluster sizes
classifier k =2 k =3 k =4
ANN
k =2 k =3 k =4
{ 211, 33 }{ 148, 19, 77 }{ 127, 15, 48, 54 }
0.94
0.97
0.91
J48
0.88
0.90
0.89
naive Bayes 0.99
0.98
0.98
Classification Based on a Single Pair of Duration Quartile Variables. Observed cluster
separation is fair in two-dimensional projections of the bout duration dataset in terms
of the bout duration clustering variables, as expected based on the MDS visualization
in Fig. 3. An example involves the wake.Q3 and SWS.Q1 bout duration quartile vari-
ables. As observed in Fig. 3, there is considerable overlap among the clusters near the
bottom left corner. Fig. 3 also shows sample decision boundaries in this reduced two-
dimensional feature space using a linear discriminant analysis (LDA) classifier.
Classification Rule Description of Clusters. Use of the rule induction algorithm RIP-
PER [8] (JRIP) over the wake.Q3 and SWS.Q1 predictive variables alone, with the
k =3 cluster label as the class, yields, after pruning and simplification, the classifica-
tion rules shown in Fig. 4. The final rule is a default rule that is used when the other
 
Search WWH ::




Custom Search