Geoscience Reference
In-Depth Information
developed algorithm for clustering spatial data that is specially designed for spatial
data mining. The CNG algorithm combines the concepts of the NG algorithm with
the GeoSOM (Bação et al. 2005 ), a variant of the famous self-organizing map
algorithm (SOM; Kohonen 1982 , 2001 ), in order to take spatial dependence into
account. A particular advantage of the CNG is that it quantizes the data space better
than the GeoSOM, because the adaptation of the CNG's neurons, in contrast to
the SOM, does not depend on some predefined and fixed topology (Hagenauer
and Helbich 2013 ). However, the topology of the SOM facilitates the analysis of
the SOM and hence supports the understanding of the properties of the data (e.g.,
Hagenauer et al. 2011 ; Skupin and Esperbé 2011 ; Arribas-Bel and Schmidt 2013 ).
In particular, it is useful for determining the actual number of clusters in the data,
either computationally (e.g., Murtagh 1995 ; Costa and De Andrade Netto 1999 )or
by visualizing it (see Flexer 2001 ).
Another important special case of clustering is graph clustering. A graph is a
set of vertices and edges that are connections between pairs of vertices. The edges
can have a weight assigned which indicates the strength of the connection and can
be directed or undirected. The task of spatial clustering is organizing the vertices
of a graph into clusters such that the vertices within a cluster are better connected
than the vertices within different clusters. The ability to find and analyze clusters
is useful for understanding and visualizing the structure of networks, which is of
great importance in many research areas that deal with social, technological, or
information systems. Many different algorithms have been developed in the past for
this purpose (see Schaeffer 2007 ). From the large set of available graph clustering
algorithms, the heuristic multilevel modularity optimization algorithm (MLMO;
Blondel et al. 2008 ) is particularly promising, because it is exceptionally fast even
for very large graphs and automatically determines the number of clusters in the
graph by optimizing its quality score.
This study introduces a new method that combines CNG, topology learning, and
graph clustering algorithms to outline clusters in CNG. The method consists of the
following steps: First, a CNG consisting of a sufficient number of neurons is trained.
Second, a topology of the neurons is learned and the resulting topology is considered
a weighted graph. Finally, this graph is clustered using advanced graph clustering
algorithms, which do not require to specify the desired number of clusters. The
resulting clusters represent homogeneous regions in the input data. Since the number
of clusters is automatically determined depending on the topological patterns of the
graph, the method is especially useful for outlining spatial clusters when no prior
knowledge about the actual number of clusters is available.
This workflow is closely related to the clustering approach using the GeoSOM
algorithm. In this approach, a GeoSOM consisting of a sufficiently large number
of neurons is trained to project the input data onto a two-dimensional map.
Subsequently, the map is visualized, usually by means of a U-matrix (Ultsch and
Siemon 1990 ). Clusters appear on the U-matrix as valleys, cluster boundaries as
ridges. However, for complex and high-dimensional data sets, U-matrices often
show no clear patterns so that it is difficult or even impossible to determine clusters,
in particular when using computational methods.
Search WWH ::




Custom Search