Inference of Co-occurring Classes: Multi-class and Multi-label Classification - Computational Intelligence Paradigms in Advanced Pattern Classification

Information Technology Reference

In-Depth Information

There are researchers that focus mainly on classification methods and apply

them to various available datasets, which present certain types of complexity, oth-

ers focus on a certain knowledge domain and devise methods that are appropriate

or suited to the complexity presented by the specific knowledge domain and data.

As can be seen from the discussion above, in many cases, “Getting intimate with

the data” [70] is a ground rule that may affect the classification results. Therefore

many of the classification solutions, although usually based on a few common

basic methods, differ by the treatment which is tailored to the processed data.

However, these tailored solutions are often suitable for various types of data and

applications that share similar properties. The similarities may include the number

of available classes, the relations between classes (taxonomic method), desired

number of output classes per sample, the amount of available data per class, the

connections between adjacent samples or between different parts of a single sam-

ple, the number of features that distinguish between classes, the number of sets of

features that can be used for classification, the levels of feature abstraction or the

number of pre-processing stages, and the like. The next section discusses a few of

the considerations related to annotation or labeling methods. These have a consi-

derable affect on the classification process, defining the inputs, outputs and

therefore also the demands of the classification algorithm.

4 Data and Annotation

Large volumes of domain knowledge are available but they are not always con-

structed in a manner that can be processed by machines. The selection of classes

and the association of items with these classes have an immense effect on the clas-

sification goals, design and capabilities [22, 46, 57, 61]. Ontologies and taxono-

mies are used in order to label the datasets, i.e. assigning concepts (labels, groups,

categories) to instances (individual concrete objects or samples), which are the in-

puts to the classification system, and its outputs. Therefore, they also define many

of the demands of the classification system properties. Much effort is put into

building domain-specific ontologies. The ontologies present a limited vocabulary

specific to the knowledge domain, and specify “what exists” and therefore also

what could be derived. These often also include synonyms and nested terms or

conjunctures. In other words, ontology is a formal explicit specification of a

shared conceptualization of a domain of interest [24].

The term taxonomy refers to the manner in which the terms are organized and

presented. These represent concepts and the relations between them. Taxonomies

usually refer to a formal organization, such as the organization of families and spe-

cies in Biology. For the knowledge domains which are characterized, there are of-

ten several taxonomic models for representing and organizing the knowledge.

These range from a few mutually exclusive and easily defined categories (the cate-

gorical approach), through representations of the knowledge space and the individ-

ual categories on a system of few dimensions or facets (the dimensional approach),

to prototypes, a hierarchical organization of groups, or tree-like taxonomies in

which the trees have several root categories (the prototypical approach) [25, 57].

For example, in the case of affective states [60], the categorical approach refers to

Computational Intelligence Paradigms in Advanced Pattern Classification

Search WWH ::

Custom Search

Home