Discretization - Data Preprocessing in Data Mining - page 261

Graphics Reference

In-Depth Information

Table 9.3 Parameters of the

discretizers and classifiers

Method

Parameters

C4.5

Pruned tree, confidence

=

0.25, 2 examples per

leaf

DataSqueezer Pruning and generalization threshold = 0.05

KNN

K

=

3, HVDM distance

PUBLIC

25 nodes between prune

Ripper

k = 2, grow set = 0.66

1R

6 examples of the same class per interval

CADD

Confidence threshold

=

0.01

Chi2

Inconsistency threshold = 0.02

ChiMerge

Confidence threshold

=

0.05

FDD

Frequency size

=

30

FUSINTER

α = 0 . 975, λ = 1

HDD

Coefficient

=

0.8

IDD

Neighborhood

=

3, windows size

=

3,

nominal distance

MODL

Optimized process type

UCPD

Intervals

[3, 6], KNN map type,

neighborhood

=

6,

Minimum support = 25, merged

threshold = 0.5,

Scaling factor

=

=

0.5, use discrete

The data sets considered are partitioned using the 10-FCV procedure. The para-

meters of the discretizers and classifiers are those recommended by their respective

authors. They are specified in Table 9.3 for those methods which require them. We

assume that the choice of the values of parameters is optimally chosen by their own

authors. Nevertheless, in discretizers that require the input of the number of intervals

as a parameter, we use a rule of thumb which is dependent on the number of instances

in the data set. It consists in dividing the number of instances by 100 and taking the

maximum value between this result and the number of classes. All discretizers and

classifiers are run one time in each partition because they are non-stochastic.

Two performance measures are widely used because of their simplicity and suc-

cessful application when multi-class classification problems are dealt with. We refer

to accuracy and Cohen's kappa [ 31 ] measures, which will be adopted to measure the

efficacy discretizers in terms of the generalization classification rate. The explanation

of Cohen's kappa was given in Chap. 2 .

The empirical study involves 30 discretization methods from those listed in

Table 9.1 . We want to outline that the implementations are only based on the descrip-

tions and specifications given by the respective authors in their papers.

Next Page

Data Preprocessing in Data Mining

Search WWH ::

Custom Search

Home