Clustering Contextual Neural Gas: A New Approach for Spatial Planning and Analysis Tasks - Computational Approaches for Urban Environments

Geoscience Reference

In-Depth Information

1. Contextual neural gas: The CNG algorithm clusters the data set into n spatial

clusters, where n is the number of neurons. The actual number of clusters in the

data is typically unknown; n must be chosen large enough so that a reasonable

cluster structure can be detected in the subsequent steps. However, if n is too

large, some of the CNG's neurons may not map any data at all. These neurons

must not be removed, because the rank ordering of CNG depends on the number

of neurons (Hagenauer and Helbich 2013 ).

2. Topology learning: A topology of the CNG's neurons is learned with a modi-

fication of the CHL algorithm. The algorithm can be described as follows: For

each input vector, the ranking order of neurons is determined according to the

two-phase procedure of the CNG, and a connection between the two highest

ranked neurons is added to the connection set. Additionally, the number of times

a connection has been added to the set is stored for each connection. This number

finally indicates the strength of a connection and is of use in the next step.

3. Graph clustering: Before clustering the resulting graph, single vertices that

are not connected to any other vertex are removed because the neurons that

these vertices represent do not map any data and bear no valuable topological

information. Then the graph is clustered based on its structural properties using

the MLMO algorithm.

4.4

Experiments

To evaluate the proposed method, two experiments on different data sets are

conducted. In both experiments, a CNG with 25 neurons is applied. The neurons

are randomly initialized and the training time is set to 100;000 iterations. The

neighborhood range and the adaptation rate are chosen as proposed by Martinetz

et al. ( 1993 ).

4.4.1

Synthetic Data

In this experiment, a synthetic data set is constructed whose properties are clearly

determined. Consequently, the results of the proposed method can be easily

evaluated. The data set consists of five clusters: one large cluster in the middle with

low point density and four smaller clusters in the corners with higher point density

(see Fig. 4.1 ). Each cluster contains 200 random data points and each point has three

attributes: the x and y coordinates and a synthetic attribute, whose value is zero for

the middle cluster and otherwise one.

The main challenge when clustering this data set is to differentiate between the

spatial clusters in the corners of the data set, because their borders are defined by

spatial point density. Spatial clustering algorithms which solely consider the spatial

distances between points and/or the similarity of the points' attribute value are likely

Search WWH ::

Custom Search

Home