Graphics Reference
In-Depth Information
each containing c a and C b number of classes. The Mantaras distance between two
partitions due to a single cut point is given below.
I
(
S a |
S b ) +
I
(
S b |
S a )
Dist
(
S a ,
S b ) =
(
S a
S b )
I
Since,
I
(
S b |
S a ) =
I
(
S b
S a )
I
(
S a )
I
(
S a ) +
I
(
S b )
Dist
(
S a ,
S b ) =
2
I
(
S a
S b )
where,
c a
I
(
S a ) =−
S i log 2 S i
i
=
1
c b
I
(
S b ) =−
S j log 2 S j
j
=
1
c a
c b
I
(
S a
S b ) =−
S ij log 2 S ij
i
=
1
j
=
1
= |
C i |
N
S i
|
C i |=
total count of class i
N
=
total number of instances
S ij =
S i ×
S j
It chooses the cut point that minimizes the distance. As a stopping criterion, it uses
the minimum description length discussed previously to determine whether more cut
points should be added.
PKID [ 122 ]
In order to maintain a low bias and a low variance in a learning scheme, it is recom-
mendable to increase both the interval frequency and the number of intervals as the
amount of training data increases too. A good way to achieve this is to set interval
frequency and interval number equally proportional to the amount of training data.
This the main purpose of proportional discretization (PKID).
When discretizing a continuous attribute for which there are N instances, sup-
posing that the desired interval frequency is s and the desired interval number is t ,
PKID calculates s and t by the following expressions:
 
Search WWH ::




Custom Search