Information Technology Reference
In-Depth Information
There are researchers that focus mainly on classification methods and apply
them to various available datasets, which present certain types of complexity, oth-
ers focus on a certain knowledge domain and devise methods that are appropriate
or suited to the complexity presented by the specific knowledge domain and data.
As can be seen from the discussion above, in many cases, “Getting intimate with
the data” [70] is a ground rule that may affect the classification results. Therefore
many of the classification solutions, although usually based on a few common
basic methods, differ by the treatment which is tailored to the processed data.
However, these tailored solutions are often suitable for various types of data and
applications that share similar properties. The similarities may include the number
of available classes, the relations between classes (taxonomic method), desired
number of output classes per sample, the amount of available data per class, the
connections between adjacent samples or between different parts of a single sam-
ple, the number of features that distinguish between classes, the number of sets of
features that can be used for classification, the levels of feature abstraction or the
number of pre-processing stages, and the like. The next section discusses a few of
the considerations related to annotation or labeling methods. These have a consi-
derable affect on the classification process, defining the inputs, outputs and
therefore also the demands of the classification algorithm.
4 Data and Annotation
Large volumes of domain knowledge are available but they are not always con-
structed in a manner that can be processed by machines. The selection of classes
and the association of items with these classes have an immense effect on the clas-
sification goals, design and capabilities [22, 46, 57, 61]. Ontologies and taxono-
mies are used in order to label the datasets, i.e. assigning concepts (labels, groups,
categories) to instances (individual concrete objects or samples), which are the in-
puts to the classification system, and its outputs. Therefore, they also define many
of the demands of the classification system properties. Much effort is put into
building domain-specific ontologies. The ontologies present a limited vocabulary
specific to the knowledge domain, and specify “what exists” and therefore also
what could be derived. These often also include synonyms and nested terms or
conjunctures. In other words, ontology is a formal explicit specification of a
shared conceptualization of a domain of interest [24].
The term taxonomy refers to the manner in which the terms are organized and
presented. These represent concepts and the relations between them. Taxonomies
usually refer to a formal organization, such as the organization of families and spe-
cies in Biology. For the knowledge domains which are characterized, there are of-
ten several taxonomic models for representing and organizing the knowledge.
These range from a few mutually exclusive and easily defined categories (the cate-
gorical approach), through representations of the knowledge space and the individ-
ual categories on a system of few dimensions or facets (the dimensional approach),
to prototypes, a hierarchical organization of groups, or tree-like taxonomies in
which the trees have several root categories (the prototypical approach) [25, 57].
For example, in the case of affective states [60], the categorical approach refers to
Search WWH ::




Custom Search