Graphics Reference
In-Depth Information
reduced subset of discrete values. Once the discretization is performed, the data can
be treated as nominal data during any induction or deduction DM process. Many
existing DM algorithms are designed only to learn in categorical data, using nom-
inal attributes, while real-world applications usually involve continuous features.
Numerical features have to be discretized before using such algorithms.
In supervised learning, and specifically classification, the topic of this survey,
we can define the discretization as follows. Assuming a data set consisting of N
examples and C target classes, a discretization algorithmwould discretize the contin-
uous attribute A in this data set into m discrete intervals D
={[
d 0 ,
d 1 ] ,(
d 1 ,
d 2 ] ,...,
(
d m 1 ,
d m ]}
, where d 0 is the minimal value, d m is the maximal value and d i
<
d i + i ,
for i
=
0
,
1
,...,
m
1. Such a discrete result D is called a discretization scheme
on attribute A and P
is the set of cut points of attribute A .
The necessity of using discretization on data can be caused by several factors.
Many DM algorithms are primarily oriented to handle nominal attributes [ 36 , 75 ,
123 ], or may even only deal with discrete attributes. For instance, three of the ten
methods considered as the top ten in DM [ 120 ] require an embedded or an external
discretization of data: C4.5 [ 92 ], Apriori [ 1 ] and Naïve Bayes [ 44 , 122 ]. Even with
algorithms that are able to deal with continuous data, learning is less efficient and
effective [ 29 , 94 ]. Other advantages derived from discretization are the reduction and
the simplification of data, making the learning faster and yieldingmore accurate, with
compact and shorter results; and any noise possibly present in the data is reduced.
For both researchers and practitioners, discrete attributes are easier to understand,
use, and explain [ 75 ]. Nevertheless, any discretization process generally leads to a
loss of information, making the minimization of such information loss the main goal
of a discretizer.
Obtaining the optimal discretization is NP-complete [ 25 ]. A vast number of dis-
cretization techniques can be found in the literature. It is obvious that when dealing
with a concrete problem or data set, the choice of a discretizer will condition the suc-
cess of the posterior learning task in accuracy, simplicity of the model, etc. Different
heuristic approaches have been proposed for discretization, for example, approaches
based on information entropy [ 36 , 41 ], statistical
={
d 1 ,
d 2 ,...,
d m 1 }
2 test [ 68 , 76 ], likelihood [ 16 ,
119 ], rough sets [ 86 , 124 ], etc. Other criteria have been used in order to provide a clas-
sification of discretizers, such as univariate/multivariate, supervised/unsupervised,
top-down/bottom-up, global/local, static/dynamic and more. All these criteria are
the basis of the taxonomies already proposed and they will be deeply elaborated
upon in this chapter. The identification of the best discretizer for each situation is a
very difficult task to carry out, but performing exhaustive experiments considering a
representative set of learners and discretizers could help to make the best choice.
Some reviews of discretization techniques can be found in the literature [ 9 , 36 , 75 ,
123 ]. However, the characteristics of the methods are not studied completely, many
discretizers, even classic ones, are not mentioned, and the notation used for catego-
rization is not unified. In spite of the wealth of literature, and apart from the absence of
a complete categorization of discretizers using a unified notation, it can be observed
that, there are few attempts to empirically compare them. In this way, the algorithms
proposed are usually compared with a subset of the complete family of discretizers
χ
 
Search WWH ::




Custom Search