Information Technology Reference
In-Depth Information
Fig. 7.17. Map labeling using expert-classified data. Classified data are assigned to
the relevant neurons of the map. Then each neuron is labeled using majority voting
among classified data that are allocated to that neuron
7.4.2 Searching a Partition that Is Appropriate to the Classes
If the amount of expert-labeled data is too small, the above labeling method
is inappropriate. Majority voting result has a large variance and may gener-
ate classification error with significant probability. The presence of a single
wrongly labeled observation may lead to assigning a wrong label to the associ-
ated neuron. Thus, a whole region of the data space may be wrongly classified.
Furthermore, due to the small amount of labeled data, a significant number
of subsets of the partition may include no labeled data, so that the algorithm
is not able to provide them with any label.
In that case, it is possible to take into account an additional phase, in which
the various observation subsets are clustered as appropriately as possible. A
coarser partition is sought, and labeling will be performed after that addi-
tional clustering phase. When one fuses several partition subsets, more expert-
classified data are available to label a larger subset. Of course, as before, the
whole process is valid only if the original clustering is consistent with the
classification, so that majority voting can select the right label.
If the map and the partition that are provided by SOM are assumed to
be relevant, then the two following additional assumptions are taken into
consideration:
The data quantization is correct, so that each reference vector is a good
representative of its allocated data.
Topology is relevant: two subsets that are represented by neighboring neu-
rons on the map contain observations that are close in data space.
Search WWH ::




Custom Search