Data Sets and Proper Statistical Analysis of Data Mining Techniques - Data Preprocessing in Data Mining

Graphics Reference

In-Depth Information

same phenomenon. Cohen's kappa can be adapted to classification tasks and it

is recommended to be employed because it takes random successes into consid-

eration as a standard, in the same way as the AUC measure (Ben-David 2007).

Also, it is used in some well-known software packages, such as Weka [ 28 ], SAS,

SPSS, etc.

An easy way of computing Cohen's kappa is to make use of the resulting confu-

sion matrix in a classification task. Specifically, the Cohen's kappa measure can be

obtained using the following expression:

n i = 1 x ii − i = 1 x i ·

x

·

i

kappa

=

(2.1)

− i = 1 x i · x · i

n 2

where x ii is the cell count in the main diagonal, n is the number of examples in the

data set, C is the number of class labels and x i · ,

x

i are the rows and columns total

·

counts respectively.

Cohen's kappa ranges from

1 (total disagreement) through 0 (random classifi-

cation) to 1 (perfect agreement). Being a scalar, it is less expressive than ROC curves

when applied to binary-classification. However, for multi-class problems, kappa is

a very useful, yet simple, meter for measuring the accuracy of the classifier while

compensating for random successes.

The main difference between classification rate and Cohen's kappa is the scoring

of the correct classifications. Classification rate scores all the successes over all

classes, whereas Cohen's kappa scores the successes independently for each class

and aggregates them. The second way of scoring is less sensitive to randomness

caused by different number of examples in each class, which causes a bias in the

learner towards the obtention of data-dependent models.

−

2.2 Using Statistical Tests to Compare Methods

Using the raw performance measures to compare different ML methods and to estab-

lish a ranking is discouraged. It has been recently analyzed [ 5 , 10 ] that other tools

of statistical nature must be utilized in order to obtain meaningful and durable con-

clusions.

In recent years, there has been a growing interest for the experimental analysis

in the field of DM. It is noticeable due to the existence of numerous papers which

analyze and propose different types of problems, such as the basis for experimental

comparisons of algorithms, proposals of different methodologies in comparison or

proposals of use of different statistical techniques in algorithms comparison.

The “No free lunch” theorem [ 29 ] demonstrates that it is not possible to find one

algorithm behaving better for any problem. On the other hand, we know that we can

work with different degrees of knowledge of the problem which we expect to solve,

and that it is not the same to work without knowledge of the problem (hypothesis of

Data Preprocessing in Data Mining

Search WWH ::

Custom Search

Home