Information Technology Reference
In-Depth Information
order to facilitate manual annotation. Some of them present various hierarchical le-
vels of taxonomy, such as the tool presented by Woitek et al. [71].
The size of the set of labels can be limited and pre-defined or extendable [11,
14, 28, 56, 78]. With the existence of multiple classes and multiple labels, the
number of data samples with these labels is not always statistically significant.
The problem of relatively small number of annotated samples is also associated
with the problem of consistent labeling of the training data, or the “ground truth”.
The issue of obtaining training data for meta-level classifiers, such as classifiers
that combine binary classifiers, has also been addressed. The simple method, if
there is enough data, is to split the data so that a part of it is used for the training,
testing and validation of the binary classifiers, and another part of the data is used
for training or for examination of the combining or meta classifier. Shiraishi and
Fukumizu [55] discuss two other methods: reusing the data used for the binary
classifiers also for the combining algorithm, which can lead to over-fitting, and
stacking via cross-validation, bootstrap or bagging [41].
In most cases, the output for each label is binary in nature, i.e. a label either ex-
ists or not. However, there are cases in which the level of recognition of a class is
important for tracking spatial and temporal processes and tendencies, such as the
level of occurrence of each label between successive data samples recorded during
sustained human-computer interactions, or in the analysis of geospatial informa-
tion [1, 50, 59]. In such cases, the classes are not always mutually exclusive, and
the data samples are not necessarily independent. There are fields in which the sit-
uation is reversed, such as in geography, in which “everything is related to every-
thing else but nearby things are more related than distant things” [50].
All these considerations affect the classification process, they define the input,
and the output of the classification system and pose requirements on the classifica-
tion method.
5 Classification Approaches
This section summarizes the definitions of binary, multi-class and multi-label
classifications in order to establish a common ground before getting into more de-
tails in the following sections. Schematic descriptions of these classification me-
thods are presented in Fig 2.
5.1 Binary Classification
In binary classification, a classifier has to decide between two possible choices:
YES/NO answers, i.e. a sample either belongs to a class or not, or choose between two
disjoint classes. This is the most common behavior of well known classifiers [41].
5.2 Multi-class Classification
In multi-class classification, a classifier or a classification system has to choose
between more than two classes, but the sample must be still assigned to one target
class only. In other words, each sample is assigned a single class-label from a set
of n labels, and n>2 . In comparison to binary classification, the set of possible
classes increases (larger than two).
Search WWH ::




Custom Search