Graphics Reference
In-Depth Information
where:
c
=
number of classes
N ij =
number of distinct values in the i th interval, j th class
c
R i
=
number of examples in i th interval
=
N ij
j = 1
m
C j
=
number of examples in j th class
=
N ij
i
=
1
c
N
=
total number of examples
=
C j
j
=
1
E ij =
expected frequency of N ij = (
R i ×
C j )/
N
ChiMerge is a supervised, bottom-up discretizer. At the beginning, each distinct
value of the attribute is considered to be one interval.
2 tests are performed for
every pair of adjacent intervals. Those adjacent intervals with the least
χ
2 value are
merged until the chosen stopping criterion is satisfied. The significance level for
χ
χ
2 is an input parameter that determines the threshold for the stopping criterion.
Another parameter used is the called max-interval which can be included to avoid
the excessive number of intervals from being created. The recommended value for
the significance level should be included between the range from 0.90 to 0.99. The
max-interval parameter should be set to 10 or 15.
Chi2 [ 76 ]
It can be explained as an automated version of ChiMerge. Here, the statistical signif-
icance level keeps changing to merge more and more adjacent intervals as long as an
inconsistency criterion is satisfied. We understand inconsistency to be two instances
that match but belong to different classes. It is even possible to completely remove
an attribute because the inconsistency property does not appear during the process
of discretizing an attribute, acting as a feature selector. Like ChiMerge,
2 statistic
is used to discretize the continuous attributes until some inconsistencies are found
in the data.
The stopping criterion is achieved when there are inconsistencies in the data
considering a limit of zero or
χ
δ
inconsistency level as default.
Modified Chi2 [ 105 ]
In the original Chi2 algorithm, the stopping criterionwas defined as the point at which
the inconsistency rate exceeded a predefined rate
value could be given after
some tests on the training data for different data sets. The modification proposed
was to use the level of consistency checking coined from Rough Sets Theory. Thus,
this level of consistency replaces the basic inconsistency checking, ensuring that the
fidelity of the training data could be maintained to be the same after discretization
and making the process completely automatic.
δ
.The
δ
 
Search WWH ::




Custom Search