Graphics Reference
In-Depth Information
Another fixed problem is related to the merging criterion used in Chi2, which was
not very accurate, leading to overmerging. The original Chi2 algorithm computes the
χ
2 value using an initially predefined degree of freedom (number of classes minus
one). From the view of statistics, this was inaccurate because the degree of freedom
may change according to the two adjacent intervals to be merged. This fact may
change the order of merging and benefit the inconsistency after the discretization.
FUSINTER [ 126 ]
This method uses the same strategy as the ChiMerge method, but rather than trying
to merge adjacent intervals locally, FUSINTER tries to find the partition which
optimizes the measure. Next, we provide a short description of the FUSINTER
algorithm.
Obtain the boundary cut points after an increasing sorting of the values and the
formation of intervals with run of examples of the same class. Superposition of
several classes into an unique cut point is also allowed.
Construct a matrix with a similar structure like a quanta matrix.
Find two adjacent intervals whosemergingwould improve the value of the criterion
and check if they can be merged using a differential criterion.
Repeat until no improvement is possible or only an interval is achieved.
Two criteria can be used for deciding the merging of intervals. The first is the
Shannon's entropy and the second is the quadratic entropy.
9.4 Experimental Comparative Analysis
This section presents the experimental framework and the results collected and dis-
cussions on them. Sect. 9.4.1 will describe the complete experimental set up. Then,
we offer the study and analysis of the results obtained over the data sets used in
Sect. 9.4.2 .
9.4.1 Experimental Set up
The goal of this section is to show all the properties and issues related to the exper-
imental study. We specify the data sets, validation procedure, classifiers used, para-
meters of the classifiers and discretizers, and performance metrics. Data sets and
statistical tests used to contrast were described in the Chap. 2 of this topic. Here,
we will only specify the name of the data sets used. The performance of discretiza-
tion algorithms is analyzed by using 40 data sets taken from the UCI ML Database
Repository [ 8 ] and KEEL data set repository [ 3 ]. They are enumerated in Table 9.2 .
In this study, six classifiers have been used in order to find differences in perfor-
mance among the discretizers. The classifiers are:
 
Search WWH ::




Custom Search