Biology Reference
In-Depth Information
Fig. 5.7. Output of SOM where distances between neurons are mapped to (a) gray scale; (b) color
map (Kohonen [19]).
For clustering analysis, what is left is to assign a cluster label to each obser-
vation in matrix X . After visualizing the output of SOM, on the map, one can
label the regions considered to be clusters with different numbers. For instance,
in Fig. 5.7(b), we can label the top region, the left-bottom region, and the right-
bottom region as 1, 2, and 3, respectively. Then for each observation, the neuron
on the map with the highest similarity (or lowest dissimilarity) with the observa-
tion is identified. The observation is assigned a cluster label according to the label
of the region where the identified neuron falls in. If the identified neuron falls
in the areas separating the regions of clusters, the observation is identified as an
outlier.
The advantages of SOM exist in the following folds. First, it does not require
the number of cluster as the input. It completes the clustering and identifying the
number of clusters at the same time. Second, the observations enter the algorithm
sequentially, which means we do not have to load the whole dataset into the mem-
ory for clustering analysis. It is very helpful when the dataset is too large for the
computer to load all into the memory at the same time. It is also very helpful in the
case that the whole dataset is not available but that the observations come sequen-
tially. Third, SOM is a distance preserving data visualization method. It maps the
high dimensional dataset into a 2-D map, where the distance between observations
is preserved in the distance between the weight vectors associated with neurons,
and the distance is visualized by gray scale or color maps, as in Fig. 5.7. Fourth,
statistically, SOM simulates the density distribution of the dataset. For example,
in Fig. 5.7(b), we can see that the whole dataset has three areas with high density,
and different dense regions have different density distributions.
The biggest problem of SOM is that it is subjective. Although SOM identifies
the number of clusters and cluster the objects at the same time, the number of
clusters is still based on subjective judgment of human beings. For instance, in
Search WWH ::




Custom Search