Self-Organizing Maps and Unsupervised Classification - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

of the code provides information about the problem of interest. The basic

principle of the algorithm is to favor the emergence of clusters (the partition

subsets) that are appropriate for the application under consideration. If the

application involves a classification task into S classes, each partition subset

must be included in one class as completely as possible. Then, one can assign

one of the S classes to a whole cluster. Since each subset is assigned to one

neuron of the map, the classification problem amounts to labeling each neuron

of the map. The label set is the set of the S classes of the problem. Labeling

can be performed in two different ways. Since each reference vector represents

a subset of the partition P , and since the reference vector may be interpreted

as an average experiment, it is possible to use expert knowledge to recognize

the class of the reference vector on the basis of its characteristics:

1. by asking an expert of the domain to classify some data extracted from

the training set,

2. by first aggregating the neurons on a statistical basis and then use the

expert knowledge to label the clusters.

7.4.1 Labeling the Map Using Expert Data

Assume that a S -class classification task must be performed, and that the

labels of those classes must belong to a label set L =

.At

the end of SOM training, when all parameters of the map are estimated, each

observation z is assigned to a neuron c = χ ( z ), so that the label l c of that

neuron can be assigned to the observation. Therefore, the problem is: how to

label the neurons of the map with the labels of L ?

Labeling the neurons of the map is the first step in the design of a classifier

from a SOM. If the amount of data classified by the expert is very large,

labeling may be performed by majority voting (see hereafter Fig. 7.17):

{

l i ,i =1 ,...,S

}

•

Assign the expert-classified data to the various neurons of the map using

the allocation function provided by the SOM training.

•

For every neuron c , select the label l i that is the most commonly used

label for the expert classified-data assigned to neuron c .

•

All the data belonging to the subset that is represented by neuron c are

now labeled by label l i .

At the end of the labeling phase, the set of neurons c that have the same label

l can be used to approximate the probability distribution of the data of class

l . The larger the amount of expert-classified data, the better the classifier.

Of course, neurons that represent data lying on the boundaries of the classes

may get the wrong label. Another source of error is the lack of expert-classified

data in some subset represented by a given neuron: the corresponding region

of the data space is thus poorly identified.

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home