Knowledge Discovery in Spatial Planning Data: A Concept for Cluster Understanding - Computational Approaches for Urban Environments

Geoscience Reference

In-Depth Information

Frequently the number of partitions must be defined beforehand. Here we adopt

the approach of partitional clustering, building on the emergent self-organizing

feature map. The structures of the U-matrix are used to define the clusters, that is,

when the projections of data points (bestmatches) are found in a common valley.

The neurons of an ESOM can also be clustered using the clustering algorithm

U*C, which is based on grid projections and makes use of distance and density

information (Ultsch and Herrmann 2006 ). In our case, this approach leads to nine

U-matrix cluster (UC).

Clustering methods partition the data into clusters. The cluster structures criti-

cally depend, first, on the definition of a meaningful measure of distance (see above)

and, second, on the details of the clustering algorithm. If a known pre-classification

is at hand, then this may be used to evaluate the clustering. However, in most

real knowledge discovery cases, no such pre-classification is given. The question

arises as to which form of clustering is optimal. For the purposes of knowledge

discovery, the quality of any data clustering is determined by whether the resulting

classes offer some useful interpretation; in particular, whether these data classes

reveal unsuspected structures and correlations in the original data space. Hand et al.

( 2001 ) emphasize that the numerical size of clusters should not be accorded too

great importance, as it is precisely the unexpected something that goes against the

rules which is being sought.

Generally speaking, however, the validity of a clustering is often in the eye of the beholder;

for example, if a cluster produces an interesting scientific insight, we can judge it to be

useful. (Hand et al. 2001 , p. 292)

In such cases where new structures are detected, other non-supervised ap-

proaches should be adopted to validate the clustering results. One such approach

is to cluster the data using a different cluster algorithm. Another is to calculate

some cluster immanent measure. Finally, the approach which best meets the aims

of knowledge discovery is to seek a semantic interpretation of the detected clusters.

This means determining whether a cluster makes sense through the application of

knowledge generation methods (see next section).

Figure 3.13 shows a hierarchical clustering of the data using Ward clustering

(Ward 1963 ) to produce a dendrogram (Carlsson and Mémoli 2010 ). The user has

to define either a threshold distance or the number of clusters in order to define the

clustering in a hierarchical algorithm. In our case, a threshold distance of 100 was

used, giving 8 Ward Clusters.

The results of different clustering algorithms can be compared using contingency

tables (Fienberg 2007 ). In our case, the two methods have produced rather similar

clustering partitions (cf. Table 3.2 ). One of the outlier clusters, that is, number UC9

in the U-matrix clustering, has been subsumed to Ward Cluster WC6. In this case,

the Ward clustering basically confirms the U-matrix clustering and vice versa.

The silhouettes proposed by Rousseeuw ( 1987 ) are a useful graphical display

for the interpretation and validation of data partitioning. The values in a silhouette

range from 1 to C 1 for each data point. Large positive values indicate that a data

Computational Approaches for Urban Environments

Search WWH ::

Custom Search

Home