Clustering Contextual Neural Gas: A New Approach for Spatial Planning and Analysis Tasks - Computational Approaches for Urban Environments

Geoscience Reference

In-Depth Information

developed algorithm for clustering spatial data that is specially designed for spatial

data mining. The CNG algorithm combines the concepts of the NG algorithm with

the GeoSOM (Bação et al. 2005 ), a variant of the famous self-organizing map

algorithm (SOM; Kohonen 1982 , 2001 ), in order to take spatial dependence into

account. A particular advantage of the CNG is that it quantizes the data space better

than the GeoSOM, because the adaptation of the CNG's neurons, in contrast to

the SOM, does not depend on some predefined and fixed topology (Hagenauer

and Helbich 2013 ). However, the topology of the SOM facilitates the analysis of

the SOM and hence supports the understanding of the properties of the data (e.g.,

Hagenauer et al. 2011 ; Skupin and Esperbé 2011 ; Arribas-Bel and Schmidt 2013 ).

In particular, it is useful for determining the actual number of clusters in the data,

either computationally (e.g., Murtagh 1995 ; Costa and De Andrade Netto 1999 )or

by visualizing it (see Flexer 2001 ).

Another important special case of clustering is graph clustering. A graph is a

set of vertices and edges that are connections between pairs of vertices. The edges

can have a weight assigned which indicates the strength of the connection and can

be directed or undirected. The task of spatial clustering is organizing the vertices

of a graph into clusters such that the vertices within a cluster are better connected

than the vertices within different clusters. The ability to find and analyze clusters

is useful for understanding and visualizing the structure of networks, which is of

great importance in many research areas that deal with social, technological, or

information systems. Many different algorithms have been developed in the past for

this purpose (see Schaeffer 2007 ). From the large set of available graph clustering

algorithms, the heuristic multilevel modularity optimization algorithm (MLMO;

Blondel et al. 2008 ) is particularly promising, because it is exceptionally fast even

for very large graphs and automatically determines the number of clusters in the

graph by optimizing its quality score.

This study introduces a new method that combines CNG, topology learning, and

graph clustering algorithms to outline clusters in CNG. The method consists of the

following steps: First, a CNG consisting of a sufficient number of neurons is trained.

Second, a topology of the neurons is learned and the resulting topology is considered

a weighted graph. Finally, this graph is clustered using advanced graph clustering

algorithms, which do not require to specify the desired number of clusters. The

resulting clusters represent homogeneous regions in the input data. Since the number

of clusters is automatically determined depending on the topological patterns of the

graph, the method is especially useful for outlining spatial clusters when no prior

knowledge about the actual number of clusters is available.

This workflow is closely related to the clustering approach using the GeoSOM

algorithm. In this approach, a GeoSOM consisting of a sufficiently large number

of neurons is trained to project the input data onto a two-dimensional map.

Subsequently, the map is visualized, usually by means of a U-matrix (Ultsch and

Siemon 1990 ). Clusters appear on the U-matrix as valleys, cluster boundaries as

ridges. However, for complex and high-dimensional data sets, U-matrices often

show no clear patterns so that it is difficult or even impossible to determine clusters,

in particular when using computational methods.

Computational Approaches for Urban Environments

Search WWH ::

Custom Search

Home