Information Technology Reference
In-Depth Information
of the code provides information about the problem of interest. The basic
principle of the algorithm is to favor the emergence of clusters (the partition
subsets) that are appropriate for the application under consideration. If the
application involves a classification task into S classes, each partition subset
must be included in one class as completely as possible. Then, one can assign
one of the S classes to a whole cluster. Since each subset is assigned to one
neuron of the map, the classification problem amounts to labeling each neuron
of the map. The label set is the set of the S classes of the problem. Labeling
can be performed in two different ways. Since each reference vector represents
a subset of the partition P , and since the reference vector may be interpreted
as an average experiment, it is possible to use expert knowledge to recognize
the class of the reference vector on the basis of its characteristics:
1. by asking an expert of the domain to classify some data extracted from
the training set,
2. by first aggregating the neurons on a statistical basis and then use the
expert knowledge to label the clusters.
7.4.1 Labeling the Map Using Expert Data
Assume that a S -class classification task must be performed, and that the
labels of those classes must belong to a label set L =
.At
the end of SOM training, when all parameters of the map are estimated, each
observation z is assigned to a neuron c = χ ( z ), so that the label l c of that
neuron can be assigned to the observation. Therefore, the problem is: how to
label the neurons of the map with the labels of L ?
Labeling the neurons of the map is the first step in the design of a classifier
from a SOM. If the amount of data classified by the expert is very large,
labeling may be performed by majority voting (see hereafter Fig. 7.17):
{
l i ,i =1 ,...,S
}
Assign the expert-classified data to the various neurons of the map using
the allocation function provided by the SOM training.
For every neuron c , select the label l i that is the most commonly used
label for the expert classified-data assigned to neuron c .
All the data belonging to the subset that is represented by neuron c are
now labeled by label l i .
At the end of the labeling phase, the set of neurons c that have the same label
l can be used to approximate the probability distribution of the data of class
l . The larger the amount of expert-classified data, the better the classifier.
Of course, neurons that represent data lying on the boundaries of the classes
may get the wrong label. Another source of error is the lack of expert-classified
data in some subset represented by a given neuron: the corresponding region
of the data space is thus poorly identified.
Search WWH ::




Custom Search