Graphics Reference
In-Depth Information
digit recognition problem were shown to be linearly separable when considered in
pairs, becoming a simpler alternative than learning a unique non-linear classifier
over all classes simultaneously.
Classification algorithms, whose extension to multi-class problems is not easy, can
address multi-class problems using decomposition techniques [ 20 ].
In [ 71 ], the advantages of using decomposition were pointed out when the classi-
fication errors for different classes have distinct costs. The binarization allows the
binary classifiers generated to impose preferences for some of the classes.
Decomposition allows one to easily parallelize the classifier learning, since the
binary subproblems are independent and can be solved with different processors.
Dividing a problem into several new subproblems, which are then independently
solved, implies the need of a second phase where the outputs of each problem need
to be aggregated. Therefore, decomposition includes two steps:
1. Problem division . The problem is decomposed into several binary subproblems
which are solved by independent binary classifiers, called base classifiers [ 20 ].
Different decomposition strategies can be found in the literature [ 55 ]. The most
common one is OVO [ 50 ].
2. Combination of the outputs .[ 21 ] The different outputs of the binary classifiers
must be aggregated in order to output the final class prediction. In [ 21 ], an exhaus-
tive study comparing different methods to combine the outputs of the base clas-
sifiers in the OVO and OVA strategies is developed. Among these combination
methods, the Weighted Voting [ 40 ] and the approaches in the framework of prob-
ability estimates [ 95 ] are highlighted.
This topic focuses the OVO decomposition strategy due to the several advantages
shown in the literature with respect to OVA [ 20 , 21 , 37 , 76 ]:
OVO creates simpler borders between classes than OVA.
OVO generally obtains a higher classification accuracy and a shorter training time
than OVA because the new subproblems are easier and smaller.
OVA has more of a tendency to create imbalanced data sets which can be counter-
productive [ 22 , 83 ].
The application of the OVO strategy is widely extended and most of the software
tools considering binarization techniques use it as default [ 4 , 13 , 28 ].
5.4.2.2 One-vs-One Decomposition Scheme
The OVO decomposition strategy consists of dividing a classification problem with
M classes into M
2 binary subproblems. A classifier is trained for each new
subproblem only considering the examples from the training data corresponding to
thepairofclasses
(
M
1
)/
j considered.
When a new instance is going to be classified, it is presented to all the the binary
classifiers. This way, each classifier discriminating between classes
i j )
with i
<
λ i and
λ j pro-
vides a confidence degree r ij ∈[
,
]
0
1
in favor of the former class (and hence, r ji is
 
Search WWH ::




Custom Search