Graphics Reference
In-Depth Information
or consistency. For example, a threshold for m can be an upper bound for the arity
of the resulting discretization. A stopping criterion can be very simple such as fixing
the number of final intervals at the beginning of the process or more complex like
estimating a function.
9.2.2 Related and Advanced Work
Research in improving and analyzing discretization is common and in high demand
currently. Discretization is a promising technique to obtain the hoped results, depend-
ing on the DM task, which justifies its relationship to other methods and problems.
This section provides a brief summary of topics closely related to discretization from
a theoretical and practical point of view and describes other works and future trends
which have been studied in the last few years.
Discretization Specific Analysis: Susmaga proposed an analysis method for dis-
cretizers based on binarization of continuous attributes and rough sets measures
[ 104 ]. He emphasized that his analysis method is useful for detecting redundancy
in discretization and the set of cut points which can be removed without decreas-
ing the performance. Also, it can be applied to improve existing discretization
approaches.
Optimal Multisplitting: Elomaa and Rousu characterized some fundamental prop-
erties for using some classic evaluation functions in supervised univariate dis-
cretization. They analyzed entropy, information gain, gain ratio, training set error,
Gini index and normalized distance measure, concluding that they are suitable
for use in the optimal multisplitting of an attribute [ 38 ]. They also developed
an optimal algorithm for performing this multisplitting process and devised two
techniques [ 39 , 40 ] to speed it up.
Discretization of Continuous Labels: Two possible approaches have been used
in the conversion of a continuous supervised learning (regression problem) into a
nominal supervised learning (classification problem). The first one is simply to use
regression tree algorithms, such as CART [ 17 ]. The second consists of applying
discretization to the output attribute, either statically [ 46 ] or in a dynamic fashion
[ 61 ].
Fuzzy Discretization: Extensive research has been carried out around the definition
of linguistic terms that divide the domain attribute into fuzzy regions [ 62 ]. Fuzzy
discretization is characterized by membership value, group or interval number and
affinity corresponding to an attribute value, unlike crisp discretization which only
considers the interval number [ 95 ].
Cost-Sensitive Discretization: The objective of cost-based discretization is to take
into account the cost of making errors instead of just minimizing the total sum of
errors [ 63 ]. It is related to problems of imbalanced or cost-sensitive classification
[ 57 , 103 ].
 
Search WWH ::




Custom Search