Graphics Reference
In-Depth Information
digit recognition problem were shown to be linearly separable when considered in
pairs, becoming a simpler alternative than learning a unique non-linear classifier
over all classes simultaneously.
•
Classification algorithms, whose extension to multi-class problems is not easy, can
address multi-class problems using decomposition techniques [
20
].
•
In [
71
], the advantages of using decomposition were pointed out when the classi-
fication errors for different classes have distinct costs. The binarization allows the
binary classifiers generated to impose preferences for some of the classes.
•
Decomposition allows one to easily parallelize the classifier learning, since the
binary subproblems are independent and can be solved with different processors.
Dividing a problem into several new subproblems, which are then independently
solved, implies the need of a second phase where the outputs of each problem need
to be aggregated. Therefore, decomposition includes two steps:
1.
Problem division
. The problem is decomposed into several binary subproblems
which are solved by independent binary classifiers, called
base classifiers
[
20
].
Different decomposition strategies can be found in the literature [
55
]. The most
common one is OVO [
50
].
2.
Combination of the outputs
.[
21
] The different outputs of the binary classifiers
must be aggregated in order to output the final class prediction. In [
21
], an exhaus-
tive study comparing different methods to combine the outputs of the base clas-
sifiers in the OVO and OVA strategies is developed. Among these combination
methods, the Weighted Voting [
40
] and the approaches in the framework of prob-
ability estimates [
95
] are highlighted.
This topic focuses the OVO decomposition strategy due to the several advantages
shown in the literature with respect to OVA [
20
,
21
,
37
,
76
]:
•
OVO creates simpler borders between classes than OVA.
•
OVO generally obtains a higher classification accuracy and a shorter training time
than OVA because the new subproblems are easier and smaller.
•
OVA has more of a tendency to create imbalanced data sets which can be counter-
productive [
22
,
83
].
•
The application of the OVO strategy is widely extended and most of the software
tools considering binarization techniques use it as default [
4
,
13
,
28
].
5.4.2.2 One-vs-One Decomposition Scheme
The OVO decomposition strategy consists of dividing a classification problem with
M
classes into
M
2 binary subproblems. A classifier is trained for each new
subproblem only considering the examples from the training data corresponding to
thepairofclasses
(
M
−
1
)/
j
considered.
When a new instance is going to be classified, it is presented to all the the binary
classifiers. This way, each classifier discriminating between classes
(λ
i
,λ
j
)
with
i
<
λ
i
and
λ
j
pro-
vides a confidence degree
r
ij
∈[
,
]
0
1
in favor of the former class (and hence,
r
ji
is