Graphics Reference
In-Depth Information
the learning algorithm [ 75 ]. Almost all known discretizers are static, due to the
fact that most of the dynamic discretizers are really subparts or stages of DM
algorithms when dealing with numerical data [ 13 ]. Some examples of well-known
dynamic techniques are ID3 discretizer [ 92 ] and ITFP [ 6 ].
Univariate versus Multivariate: Multivariate techniques, also known as 2D dis-
cretization [ 81 ], simultaneously consider all attributes to define the initial set of
cut points or to decide the best cut point altogether. They can also discretize one
attribute at a time when studying the interactions with other attributes, exploiting
high order relationships. By contrast, univariate discretizers only work with a sin-
gle attribute at a time, once an order among attributes has been established, and
the resulting discretization scheme in each attribute remains unchanged in later
stages. Interest has recently arisen in developing multivariate discretizers since
they are very influential in deductive learning [ 10 , 49 ] and in complex classifi-
cation problems where high interactions among multiple attributes exist, which
univariate discretizers might obviate [ 42 , 121 ].
Supervised versus Unsupervised: Unsupervised discretizers do not consider the
class label whereas supervised ones do. The manner in which the latter consider
the class attribute depends on the interaction between input attributes and class
labels, and the heuristic measures used to determine the best cut points (entropy,
interdependence, etc.). Most discretizers proposed in the literature are supervised
and theoretically using class information, should automatically determine the best
number of intervals for each attribute. If a discretizer is unsupervised, it does
not mean that it cannot be applied over supervised tasks. However, a supervised
discretizer can only be applied over supervised DM problems. Representative
unsupervised discretizers are EqualWidth and EqualFrequency [ 73 ], PKID and
FFD [ 122 ] and MVD [ 10 ].
Splitting versus Merging: This refers to the procedure used to create or define new
intervals. Splitting methods establish a cut point among all the possible boundary
points and divide the domain into two intervals. By contrast, mergingmethods start
with a pre-defined partition and remove a candidate cut point to mix both adjacent
intervals. These properties are highly related to Top-Down and Bottom-up respec-
tively (explained in the next section). The idea behind them is very similar, except
that top-down or bottom-up discretizers assume that the process is incremental
(described later), according to a hierarchical discretization construction. In fact,
there can be discretizers whose operation is based on splitting or merging more
than one interval at a time [ 72 , 96 ]. Also, some discretizers can be considered
hybrid due to the fact that they can alternate splits with merges in running time
[ 24 , 43 ].
Global versus Local: To make a decision, a discretizer can either require all
available data in the attribute or use only partial information. A discretizer is said
to be local when it only makes the partition decision based on local information.
Examples of widely used local techniques are MDLP [ 41 ] and ID3 [ 92 ]. Few
discretizers are local, except some based on top-down partition and all the dynamic
techniques. In a top-down process, some algorithms follow the divide-and-conquer
scheme and when a split is found, the data is recursively divided, restricting access
 
Search WWH ::




Custom Search