A Data Mining Software Package Including Data Preparation and Reduction: KEEL - Data Preprocessing in Data Mining

Graphics Reference

In-Depth Information

next section we will explain in detail how to encode a simple algorithm within the

KEEL software tool.

10.5 KEEL Statistical Tests

Nowadays, the use of statistical tests to improve the evaluation process of the per-

formance of a new method has become a widespread technique in the field of DM

[ 34 - 36 ]. Usually, they are employed inside the framework of any experimental analy-

sis to decide when an algorithm is better than other one. This task, which may not

be trivial, has become necessary to confirm when a new proposed method offers a

significant improvement over the existing methods for a given problem.

Two kinds of tests exist: parametric and non-parametric, depending on the concrete

type of data employed. As a general rule, a non-parametric test is less restrictive than

a parametric one, although it is less robust than a parametric when data is well

conditioned.

Parametric tests have been commonly used in the analysis of experiments in DM.

For example, a common way to test whether the difference between the results of

two algorithms is non-random is to compute a paired t-test, which checks whether

the average difference in their performance over the data sets is significantly differ-

ent from zero. When comparing a set of multiple algorithms, the common statistical

method for testing the differences between more than two related sample means is the

repeated-measures ANOVA (or within-subjects ANOVA) [ 37 ]. Unfortunately, para-

metric tests are based on assumptions which are most probably violated when ana-

lyzing the performance of computational intelligence and DM algorithms [ 38 - 40 ].

These assumpitions are known as independence, normality and homoscedasticity.

Nonparametric tests can be employed in the analysis of experiments, providing

the researcher with a practical tool to use when the previous assumptions can not be

satisfied. Although they are originally designed for dealing with nominal or ordinal

data, it is possible to conduct ranking based transformations to adjust the input data to

the test requirements. Several nonparemetric methods for pairwise and multiple com-

parison are available to contrast adequately the results obtained in any Computational

Intelligence experiment. A wide description about the topic with examples, cases of

studies, bibliographic recommendations can be found in the SCI2S thematic public

website on Statistical Inference in Computational Intelligence and Data Mining . 17

KEEL is one of the fewest DM software tools that provides the researcher with a

complete set of statistical procedures for pairwise and multiple comparisons. Inside

the KEEL environment, several parametric and non-parametric procedures have been

coded, which should help to contrast the results obtained in any experiment performed

with the software tool. These tests follow the same methodology that the rest of ele-

ments of KEEL, facilitating both its employment and its integration inside a complete

experimental study.

17 http://sci2s.ugr.es/sicidm/ .

Search WWH ::

Custom Search

Home