Graphics Reference
In-Depth Information
9.3.3 Description of the Most Representative
Discretization Methods
This section is devoted to provide an in-depth description of the most representative
methods according to the previous taxonomy, the degree of usage in recent years
in the specialized literature and the results are reported in the following section of
this chapter, related to the experimental comparative analysis. We will discuss 10
discretizers in more detail: Equal Width/Frequency, MDLP, ChiMerge, Distance,
Chi2, PKID, FFD, FUSINTER, CAIM and Modified Chi2. They will be distributed
according to the splitting or merging criterion, which can be explained separately to
the rest of mechanisms of each discretizer because it is usually a shared process.
9.3.3.1 Splitting Methods
We start with a generalized pseudocode for splitting discretization methods.
Algorithm 1 Splitting Algorithm
Require: S = Sorted values of attribute A
procedure Splitting( S )
if StoppingCriterion() == true then
Return
end if
T = GetBestSplitPoint( S )
S 1 = GetLeftPart( S , T )
S 2 = GetRightPart( S , T )
Splitting( S 1 )
Splitting( S 2 )
end procedure
The splitting algorithmabove consists of all four steps in the discretization scheme,
sort the feature values, (2) search for an appropriate cut point, (3) split the range of
continuous values according to the cut point, and (4) stop when a stopping criterion
satisfies, otherwise go to (2). In the following, we include the description of the most
representative discretizers based on splitting criterion.
Equal Width or Frequency [ 75 ]
They belong to the simplest family of methods that discretize an attribute by creating
a specified number of bins, which are created by equal width or equal frequency. This
family is known as Binning methods. The arity m must be specified at the beginning
of both discretizers and determine the number of bins. Each bin is associated with a
different discrete value. In equal width , the continuous range of a feature is divided
into intervals that have an equal width and each interval represents a bin. The arity
can be calculated by the relationship between the chosen width for each interval
and the total length of the attribute range. In equal frequency , an equal number of
 
 
Search WWH ::




Custom Search