Information Technology Reference
In-Depth Information
even construction of clusters with a single member. Selection of the
SOM based topology prior to training does not mean that all clusters
will be populated after training. The most common way of SOM
organization is a square grid, although hexagonal or higher dimension
grids are also used.
Together with the trained SOM representing distribution of the data, it
is important to assess the so-called U-matrix (unifi ed distance matrix),
which visualizes distance between neighboring neurons in the SOM. It is
basically a two-dimensional representation of the elastic SOM grids. By
assessing the U-matrix, it is possible to visualize the spatial distribution
of data clusters, that is, to detect, for example, regions of the map that
are either highly populated with the data or regions where there are
almost no data. Mapping techniques allow analysis of complex
multidimensional data in an intuitively comprehensible visual manner
(Ivanenkov et al., 2009). The color scale of SOMs is built up according
to original values of high-dimensional data.
Figure 5.15 is an example of SOMs developed for three-dimensional
space (three variables: Var1, Var2, and Var3), together with the U-matrix,
which represents two data clusters (blue color) that are separated by
an empty region (red color). The color scale next to the U-matrix
indicates that the blue color is used to mark the short distance between
map cells (therefore these clusters are more densely populated), whereas
the red color is used for large distances. Elastic bonds of neurons are
spreading through the red colored empty region. If we take a look
at other maps, we can see that three-dimensional space is reduced to
two-dimensional space, but with preserved topological features. This
means that map cells are uniformly distributed, such that each sample
has the same position on each of the maps. The top left corner cell,
for example, is one sample and the Var1, Var2, and Var3 values can be
read from the maps. The color scale next to the maps indicates numeric
or discrete values of samples (data normalization is often required). If
we analyze data clusters in each of three SOMs, we can observe that
the upper cluster has the highest values of all the variables (Var1, Var2,
and Var3).
As previously mentioned, vectors of the SOM neurons m i have
traditionally been randomly initialized. It was supposed that SOM neurons
have a strong tendency to self-organize, so that the order can emerge even
when starting from a disordered state. During time, it was demonstrated
that computation of SOMs can be much faster if the initial values of
neuron vectors are selected on the basis of selection of values corres-
ponding to the largest principal components of the presented data set.
￿
￿
￿
 
Search WWH ::




Custom Search