Geoscience Reference
In-Depth Information
1. Contextual neural gas: The CNG algorithm clusters the data set into n spatial
clusters, where n is the number of neurons. The actual number of clusters in the
data is typically unknown; n must be chosen large enough so that a reasonable
cluster structure can be detected in the subsequent steps. However, if n is too
large, some of the CNG's neurons may not map any data at all. These neurons
must not be removed, because the rank ordering of CNG depends on the number
of neurons (Hagenauer and Helbich 2013 ).
2. Topology learning: A topology of the CNG's neurons is learned with a modi-
fication of the CHL algorithm. The algorithm can be described as follows: For
each input vector, the ranking order of neurons is determined according to the
two-phase procedure of the CNG, and a connection between the two highest
ranked neurons is added to the connection set. Additionally, the number of times
a connection has been added to the set is stored for each connection. This number
finally indicates the strength of a connection and is of use in the next step.
3. Graph clustering: Before clustering the resulting graph, single vertices that
are not connected to any other vertex are removed because the neurons that
these vertices represent do not map any data and bear no valuable topological
information. Then the graph is clustered based on its structural properties using
the MLMO algorithm.
4.4
Experiments
To evaluate the proposed method, two experiments on different data sets are
conducted. In both experiments, a CNG with 25 neurons is applied. The neurons
are randomly initialized and the training time is set to 100;000 iterations. The
neighborhood range and the adaptation rate are chosen as proposed by Martinetz
et al. ( 1993 ).
4.4.1
Synthetic Data
In this experiment, a synthetic data set is constructed whose properties are clearly
determined. Consequently, the results of the proposed method can be easily
evaluated. The data set consists of five clusters: one large cluster in the middle with
low point density and four smaller clusters in the corners with higher point density
(see Fig. 4.1 ). Each cluster contains 200 random data points and each point has three
attributes: the x and y coordinates and a synthetic attribute, whose value is zero for
the middle cluster and otherwise one.
The main challenge when clustering this data set is to differentiate between the
spatial clusters in the corners of the data set, because their borders are defined by
spatial point density. Spatial clustering algorithms which solely consider the spatial
distances between points and/or the similarity of the points' attribute value are likely
Search WWH ::




Custom Search