Geoscience Reference
In-Depth Information
1
2
3
4
5
6
9
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Silhouette Value
Fig. 3.14
Silhouette plot of the clustering partition
3.3.7
Explaining the Clusters/Knowledge Generation
Having obtained a clustering of the data as described above, the next important
step in the process of knowledge discovery is to ask “What do the clusters mean?”.
This is the layer termed “knowledge generation” in Fig. 3.1 . In order to answer
this question, we use algorithms from machine learning to produce “symbolic
classifiers”. These algorithms take a given classification of the data, such as the
clustering calculated above (cf. Fig. 3.15 : Spatial distribution of the U-matrix
clustering), and construct from this decision trees (CART Breiman et al. 1984 ;
C4.5 Quinlan 1993 ; C5.0 Quinlan 2013 ; Random Forest Breiman 2001 etc.) or
decision rules (such as sig* Ultsch 1991 or Ripper Cohen 1995 etc.).
In the case at hand, we applied rule extraction from a CART decision tree on
the UD data. The generated rules are listed in Appendix 3 . The rule generation is
steered so that a spatial planning expert could easily understand the rules (e.g., low,
medium, high values). The application of these rules to the unclassified data assigns
a data point to a cluster. The quality of the rules can be reviewed by drawing up a
contingency table of the clustering vs. the assignment of cluster labels by the rules
(cf. Table 3.3 ). The two outlier cluster UC8 and UC9 are not taken into account for
a rule-based explanation.
As the rules assign almost all data to the correct clusters, it can be assumed
that they are sufficiently precise. The rules can be read and understood by a spatial
planning expert in order to assign meaning to a particular cluster.
Search WWH ::




Custom Search