Graphics Reference
In-Depth Information
same phenomenon. Cohen's kappa can be adapted to classification tasks and it
is recommended to be employed because it takes random successes into consid-
eration as a standard, in the same way as the AUC measure (Ben-David 2007).
Also, it is used in some well-known software packages, such as Weka [ 28 ], SAS,
SPSS, etc.
An easy way of computing Cohen's kappa is to make use of the resulting confu-
sion matrix in a classification task. Specifically, the Cohen's kappa measure can be
obtained using the following expression:
n i = 1 x ii i = 1 x i ·
x
·
i
kappa
=
(2.1)
i = 1 x i · x · i
n 2
where x ii is the cell count in the main diagonal, n is the number of examples in the
data set, C is the number of class labels and x i · ,
x
i are the rows and columns total
·
counts respectively.
Cohen's kappa ranges from
1 (total disagreement) through 0 (random classifi-
cation) to 1 (perfect agreement). Being a scalar, it is less expressive than ROC curves
when applied to binary-classification. However, for multi-class problems, kappa is
a very useful, yet simple, meter for measuring the accuracy of the classifier while
compensating for random successes.
The main difference between classification rate and Cohen's kappa is the scoring
of the correct classifications. Classification rate scores all the successes over all
classes, whereas Cohen's kappa scores the successes independently for each class
and aggregates them. The second way of scoring is less sensitive to randomness
caused by different number of examples in each class, which causes a bias in the
learner towards the obtention of data-dependent models.
2.2 Using Statistical Tests to Compare Methods
Using the raw performance measures to compare different ML methods and to estab-
lish a ranking is discouraged. It has been recently analyzed [ 5 , 10 ] that other tools
of statistical nature must be utilized in order to obtain meaningful and durable con-
clusions.
In recent years, there has been a growing interest for the experimental analysis
in the field of DM. It is noticeable due to the existence of numerous papers which
analyze and propose different types of problems, such as the basis for experimental
comparisons of algorithms, proposals of different methodologies in comparison or
proposals of use of different statistical techniques in algorithms comparison.
The “No free lunch” theorem [ 29 ] demonstrates that it is not possible to find one
algorithm behaving better for any problem. On the other hand, we know that we can
work with different degrees of knowledge of the problem which we expect to solve,
and that it is not the same to work without knowledge of the problem (hypothesis of
 
Search WWH ::




Custom Search