Dealing with Noisy Data - Data Preprocessing in Data Mining

Graphics Reference

In-Depth Information

digit recognition problem were shown to be linearly separable when considered in

pairs, becoming a simpler alternative than learning a unique non-linear classifier

over all classes simultaneously.

•

Classification algorithms, whose extension to multi-class problems is not easy, can

address multi-class problems using decomposition techniques [ 20 ].

•

In [ 71 ], the advantages of using decomposition were pointed out when the classi-

fication errors for different classes have distinct costs. The binarization allows the

binary classifiers generated to impose preferences for some of the classes.

•

Decomposition allows one to easily parallelize the classifier learning, since the

binary subproblems are independent and can be solved with different processors.

Dividing a problem into several new subproblems, which are then independently

solved, implies the need of a second phase where the outputs of each problem need

to be aggregated. Therefore, decomposition includes two steps:

1. Problem division . The problem is decomposed into several binary subproblems

which are solved by independent binary classifiers, called base classifiers [ 20 ].

Different decomposition strategies can be found in the literature [ 55 ]. The most

common one is OVO [ 50 ].

2. Combination of the outputs .[ 21 ] The different outputs of the binary classifiers

must be aggregated in order to output the final class prediction. In [ 21 ], an exhaus-

tive study comparing different methods to combine the outputs of the base clas-

sifiers in the OVO and OVA strategies is developed. Among these combination

methods, the Weighted Voting [ 40 ] and the approaches in the framework of prob-

ability estimates [ 95 ] are highlighted.

This topic focuses the OVO decomposition strategy due to the several advantages

shown in the literature with respect to OVA [ 20 , 21 , 37 , 76 ]:

•

OVO creates simpler borders between classes than OVA.

•

OVO generally obtains a higher classification accuracy and a shorter training time

than OVA because the new subproblems are easier and smaller.

•

OVA has more of a tendency to create imbalanced data sets which can be counter-

productive [ 22 , 83 ].

•

The application of the OVO strategy is widely extended and most of the software

tools considering binarization techniques use it as default [ 4 , 13 , 28 ].

5.4.2.2 One-vs-One Decomposition Scheme

The OVO decomposition strategy consists of dividing a classification problem with

M classes into M

2 binary subproblems. A classifier is trained for each new

subproblem only considering the examples from the training data corresponding to

thepairofclasses

(

M

−

1

)/

j considered.

When a new instance is going to be classified, it is presented to all the the binary

classifiers. This way, each classifier discriminating between classes

(λ i ,λ j )

with i

<

λ i and

λ j pro-

vides a confidence degree r ij ∈[

,

]

0

1

in favor of the former class (and hence, r ji is

Data Preprocessing in Data Mining

Search WWH ::

Custom Search

Home