Inference of Co-occurring Classes: Multi-class and Multi-label Classification - Computational Intelligence Paradigms in Advanced Pattern Classification

Information Technology Reference

In-Depth Information

6.1.3 Combining Binary Classifiers

This section presents a few of the methods designed to combine binary classifiers.

Wu et al. [72] review several of the methods for combining pair-wise classifiers.

The most commonly used set of methods is based on voting [21, 30], i.e. first all

the pair-wise comparisons are constructed and then the class that won in the high-

est number of these comparisons is selected. Majority vote, as in politics, com-

prises of various methods which enable choosing one candidate or more and aim

to resolve conflicts [40, 60, 72]. Voting only selects the class label, not the

probability of this selection. Another approach comprises various methods for

summation of the pair-wise probabilities [26, 72]. Allwein et al [2], suggest an ex-

ponential loss decoding. Wu et al. [72], present two methods, the first obtains the

probability estimates via an approximation solution to an identity, based on finite

Markov Chains theory. The second is a coupling method. Hastie and Tibshirani

[26] use the Bradley-Terry model for combining binary classifiers. Shiraishi and

Fukumizu [55] review combination methods of binary Support Vector Machines

(SVMs). Combination of binary SVMs is computationally more feasible and not

inferior to direct multi-class SVMs. Shiraishi and Fukumizu propose a method for

combining relatively strong binary classifiers, based on statistical techniques, such

as penalized logistic regression, stacking, and a sparsity promoting penalty, while

the binary classifiers do not have to return probabilistic values. This method is af-

fective both for one-against-one and one-against-all paradigms. The benefit is that

an estimate of the conditional probability for each class can be obtained. They also

propose selecting only the relevant binary classifiers by adding the group lasso

type penalty while training the combining method. Fernandeza et al. [19] present

fuzzy rule based pair-wise classification, in which the combination stage is treated

as a decision-making problem, solved by rules based on the maximal non-

dominance criterion. Fuzzy based classification systems use a combination be-

tween Fuzzy Logic and statistical machine learning. They are widely used to solve

classification problems (medicine, unmanned vehicles, battle-field analysis and in-

trusion detection), because of their interpretable models based on linguistic va-

riables, which are easier to understand for the experts or end-users.

6.2 Direct Multi-class Classification

Schapire [52] proved that a strong classifier can be generated by combining weak

classifiers through boosting. This originated the suite of AdaBoost algorithms

[20]. The design of the classifier is iterative. In each iteration a higher weight is

assigned to the data samples not yet accurately classified. AdaBoost can identify

outliners: i.e. examples that are either mislabeled or that are inherently ambiguous

and hard to categorize, and is robust against over-fitting. However, the actual per-

formance of boosting depends on the data and on the performance of the basic

learning algorithm. AdaBoost. MH and AdaBoost.MR were devised in order to

deal with multi-class problems. AdaBoost. MH is base on minimization of the loss

Search WWH ::

Custom Search

Home