Graphics Reference
In-Depth Information
point for splitting or adjacent intervals for merging, (3) splitting or merging intervals
of continuous values according to some defined criterion, and (4) stopping at some
point. Next, we explain these four steps in detail.
9.2.1.5 Sorting
The continuous values for a feature are sorted in either descending or ascending
order. It is crucial to use an efficient sorting algorithm with a time complexity of
O
, for instance the well-known Quick Sort algorithm. Sorting must be
done only once and for all the start of discretization. It is a mandatory treatment and
can be applied when the complete instance space is used for discretization. However,
if the discretization is within the process of other algorithms (such as decision trees
induction), it is a local treatment and only a region of the whole instance space is
considered for discretization.
(
NlogN
)
9.2.1.6 Selection of a Cut Point
After sorting, the best cut point or the best pair of adjacent intervals should be found
in the attribute range in order to split or merge in a following required step. An evalu-
ation measure or function is used to determine the correlation, gain, improvement in
performance and any other benefit according to the class label. There are numerous
evaluation measures and they will be discussed in Sect. 9.3.1 , the entropy and the
statistical dependency being the most well known.
9.2.1.7 Splitting/Merging
Depending on operation method of the discretizers, intervals either can be split or
merged. For splitting all the possible cut points from the whole universe within an
attribute must be evaluated. The universe is formed from all the different real values
presented in an attribute. Then, the best one is found and a split of the continuous
range into two partitions is performed. Discretization continues with each part until
a stopping criterion is satisfied. Similarly for merging, instead of finding the best
cut point, the discretizer aims to find the best adjacent intervals to merge in each
iteration. Discretization continues with the reduced number of intervals until the
stopping criterion is satisfied.
9.2.1.8 Stopping Criteria
It specifies when to stop the discretization process. It should assume a trade-off
between lower arity getting a better understanding or simplicity with high accuracy
 
Search WWH ::




Custom Search