Graphics Reference
In-Depth Information
Predictive Classification Rate: A successful algorithm will often be able to dis-
cretize the training set without significantly reducing the prediction capability of
learners in test data which are prepared to treat numerical data.
Time requirements: A static discretization process is carried out just once on a
training set, so it does not seem to be a very important evaluation method. How-
ever, if the discretization phase takes too long it can become impractical for real
applications. In dynamic discretization, the operation is repeated as many times
as the learner requires, so it should be performed efficiently.
9.3.2 Methods and Taxonomy
At the time of writting, more than 80 discretizationmethods have been proposed in the
literature. This section is devoted to enumerating and designating them according to
a standard followed in this chapter. We have used 30 discretizers in the experimental
study, those that we have identified as the most relevant ones. For more details on
their descriptions, the reader can visit the URL associated to the KEEL project. 1
Additionaly, implementations of these algorithms in Java can be found in KEEL
software [ 3 , 4 ].
Table 9.1 presents an enumeration of discretizers reviewed in this chapter. The
complete name, abbreviation and reference are provided for each one. This chapter
does not collect the descriptions of the discretizers. Instead, we recommend that
readers consult the original references to understand the complete operation of the
discretizers of interest. Discretizers used in the experimental study are depicted in
bold. The ID3 discretizer used in the study is a static version of the well-known
discretizer embedded in C4.5.
The properties studied above can be used to categorize the discretizers proposed in
the literature. The seven characteristics studied allows us to present the taxonomy of
discretizationmethods in an established order. All techniques enumerated inTable 9.1
are collected in the taxonomy drawn in Fig. 9.2 . It illustrates the categorization
following a hierarchy based on this order: static/dynamic, univariate/multivariate,
supervised/unsupervised, splitting/merging/hybrid, global/local, direct/incremental
and evaluation measure. The rationale behind the choice of this order is to achieve a
clear representation of the taxonomy.
The proposed taxonomy assists us in the organization of many discretization
methods so that we can classify them into categories and analyze their behavior. Also,
we can highlight other aspects in which the taxonomy can be useful. For example, it
provides a snapshot of existing methods and relations or similarities among them. It
also depicts the size of the families, the work done in each one and what is currently
missing. Finally, it provides a general overview of the state-of-the-art methods in
discretization for researchers/practitioners who are beginning in this field or need to
discretize data in real applications.
1 http://www.keel.es .
 
Search WWH ::




Custom Search