Geoscience Reference
In-Depth Information
1
2
3
4
5
6
9
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Silhouette Value
Fig. 3.14
Silhouette plot of the clustering partition
3.3.7
Explaining the Clusters/Knowledge Generation
Having obtained a clustering of the data as described above, the next important
step in the process of knowledge discovery is to ask “What do the clusters mean?”.
This is the layer termed “knowledge generation” in Fig.
3.1
. In order to answer
this question, we use algorithms from machine learning to produce “symbolic
classifiers”. These algorithms take a given classification of the data, such as the
clustering calculated above (cf. Fig.
3.15
: Spatial distribution of the U-matrix
clustering), and construct from this decision trees (CART Breiman et al.
1984
;
C4.5 Quinlan
1993
; C5.0 Quinlan
2013
; Random Forest Breiman
2001
etc.) or
decision rules (such as sig* Ultsch
1991
or Ripper Cohen
1995
etc.).
In the case at hand, we applied rule extraction from a CART decision tree on
the UD data. The generated rules are listed in
Appendix 3
. The rule generation is
steered so that a spatial planning expert could easily understand the rules (e.g., low,
medium, high values). The application of these rules to the unclassified data assigns
a data point to a cluster. The quality of the rules can be reviewed by drawing up a
contingency table of the clustering vs. the assignment of cluster labels by the rules
(cf. Table
3.3
). The two outlier cluster UC8 and UC9 are not taken into account for
a rule-based explanation.
As the rules assign almost all data to the correct clusters, it can be assumed
that they are sufficiently precise. The rules can be read and understood by a spatial
planning expert in order to assign meaning to a particular cluster.
Search WWH ::
Custom Search