Information Technology Reference
In-Depth Information
6.1.3 Combining Binary Classifiers
This section presents a few of the methods designed to combine binary classifiers.
Wu et al. [72] review several of the methods for combining pair-wise classifiers.
The most commonly used set of methods is based on voting [21, 30], i.e. first all
the pair-wise comparisons are constructed and then the class that won in the high-
est number of these comparisons is selected. Majority vote, as in politics, com-
prises of various methods which enable choosing one candidate or more and aim
to resolve conflicts [40, 60, 72]. Voting only selects the class label, not the
probability of this selection. Another approach comprises various methods for
summation of the pair-wise probabilities [26, 72]. Allwein et al [2], suggest an ex-
ponential loss decoding. Wu et al. [72], present two methods, the first obtains the
probability estimates via an approximation solution to an identity, based on finite
Markov Chains theory. The second is a coupling method. Hastie and Tibshirani
[26] use the Bradley-Terry model for combining binary classifiers. Shiraishi and
Fukumizu [55] review combination methods of binary Support Vector Machines
(SVMs). Combination of binary SVMs is computationally more feasible and not
inferior to direct multi-class SVMs. Shiraishi and Fukumizu propose a method for
combining relatively strong binary classifiers, based on statistical techniques, such
as penalized logistic regression, stacking, and a sparsity promoting penalty, while
the binary classifiers do not have to return probabilistic values. This method is af-
fective both for one-against-one and one-against-all paradigms. The benefit is that
an estimate of the conditional probability for each class can be obtained. They also
propose selecting only the relevant binary classifiers by adding the group lasso
type penalty while training the combining method. Fernandeza et al. [19] present
fuzzy rule based pair-wise classification, in which the combination stage is treated
as a decision-making problem, solved by rules based on the maximal non-
dominance criterion. Fuzzy based classification systems use a combination be-
tween Fuzzy Logic and statistical machine learning. They are widely used to solve
classification problems (medicine, unmanned vehicles, battle-field analysis and in-
trusion detection), because of their interpretable models based on linguistic va-
riables, which are easier to understand for the experts or end-users.
6.2 Direct Multi-class Classification
Schapire [52] proved that a strong classifier can be generated by combining weak
classifiers through boosting. This originated the suite of AdaBoost algorithms
[20]. The design of the classifier is iterative. In each iteration a higher weight is
assigned to the data samples not yet accurately classified. AdaBoost can identify
outliners: i.e. examples that are either mislabeled or that are inherently ambiguous
and hard to categorize, and is robust against over-fitting. However, the actual per-
formance of boosting depends on the data and on the performance of the basic
learning algorithm. AdaBoost. MH and AdaBoost.MR were devised in order to
deal with multi-class problems. AdaBoost. MH is base on minimization of the loss
Search WWH ::




Custom Search