Graphics Reference
In-Depth Information
and, in most of the studies, no rigorous empirical analysis has been carried out. In
[ 51 ], it was noticed that the most compared techniques are EqualWidth, EqualFre-
quency, MDLP [ 41 ], ID3 [ 92 ], ChiMerge [ 68 ], 1R [ 59 ], D2 [ 19 ] and Chi2 [ 76 ].
These reasons motivate the global purpose of this chapter. We can summarize it
into three main objectives:
To provide an updated and complete taxonomy based on the main properties
observed in the discretization methods. The taxonomy will allow us to charac-
terize their advantages and drawbacks in order to choose a discretizer from a
theoretical point of view.
To make an empirical study analyzing the most representative and newest dis-
cretizers in terms of the number of intervals obtained and inconsistency level of
the data.
Finally, to relate the best discretizers to a set of representative DM models using
two metrics to measure the predictive classification success.
9.2 Perspectives and Background
Discretization is a wide field and there have been many advances and ideas over the
years. This section is devoted to provide a proper background on the topic, together
with a set of related areas and future perspectives on discretization.
9.2.1 Discretization Process
Before starting, we must first introduce some terms used by different sources for the
sake of unification.
9.2.1.1 Feature
Also called attribute or variable refers to an aspect of the data and it is usually
associated to the columns in a data table. M stands for the number of features in the
data.
9.2.1.2 Instance
Also called tuple , example , record or data point refers to a collection of feature
values for all features. A set of instances constitute a data set and they are usually
associated to row in a data table. According to the introduction, we will set N as the
number of instances in the data.
 
Search WWH ::




Custom Search