Database Reference
In-Depth Information
Let
β
(
a
i
,dom
1
(
a
i
)
,dom
2
(
a
i
)
,S
) denote the binary criterion value for
attribute
a
i
over sample
S
when
dom
1
(
a
i
)and
dom
2
(
a
i
) are its correspond-
ing subdomains. The value obtained for the optimal division of the attribute
domain into two mutually exclusive and exhaustive subdomains is used for
comparing attributes, namely:
β
∗
(
a
i
,S
)=
max
β
(
a
i
,dom
1
(
a
i
)
,dom
2
(
a
i
)
,S
)
.
(5.11)
∀
dom
1
(
a
i
);
dom
2
(
a
i
)
5.1.11
Twoing Criterion
The Gini index may encounter problems when the domain of the target
attribute is relatively wide
[
Breiman
et al
. (1984)
]
.Insuchcases,itis
possible to employ binary criterion called twoing criterion. This criterion is
defined as:
twoing
(
a
i
,dom
1
(
a
i
)
,dom
2
(
a
i
)
,S
)
σ
a
i
∈dom
1
(
a
i
)
S
|
σ
a
i
∈dom
2
(
a
i
)
S
|
=0
.
25
·
·
·
S
|
S
|
σ
a
i
∈dom
1
(
a
i
)
AND y
=
c
i
S
σ
a
i
∈dom
2
(
a
i
)
AND y
=
c
i
S
2
σ
a
i
∈dom
1
(
a
i
)
S
σ
a
i
∈dom
2
(
a
i
)
S
−
c
i
∈dom
(
y
)
(5.12)
When the target attribute is binary, the Gini and twoing criteria are
equivalent. For multi-class problems, the twoing criteria prefers attributes
with evenly divided splits.
Orthogonal Criterion
5.1.12
The ORT criterion was presented by
[
Fayyad and Irani (1992)
]
.Thisbinary
criterion is defined as:
ORT
(
a
i
,dom
1
(
a
i
)
,dom
2
(
a
i
)
,S
)=1
−
cosθ
(
P
y,
1
,P
y,
2
)
,
(5.13)
where
θ
(
P
y,
1
,
P
y,
2
) is the angle between two vectors
P
y,
1
and
P
y,
2
.These
vectors represent the probability distribution of the target attribute in the
partitions
σ
a
i
∈dom
1
(
a
i
)
S
and
σ
a
i
∈dom
2
(
a
i
)
S
, respectively.
It has been shown that this criterion performs better than the
information gain and the Gini index for specific constellations of problems.
Search WWH ::
Custom Search