Database Reference
In-Depth Information
Let β ( a i ,dom 1 ( a i ) ,dom 2 ( a i ) ,S ) denote the binary criterion value for
attribute a i over sample S when dom 1 ( a i )and dom 2 ( a i ) are its correspond-
ing subdomains. The value obtained for the optimal division of the attribute
domain into two mutually exclusive and exhaustive subdomains is used for
comparing attributes, namely:
β ( a i ,S )=
max
β ( a i ,dom 1 ( a i ) ,dom 2 ( a i ) ,S ) .
(5.11)
dom 1 ( a i ); dom 2 ( a i )
5.1.11
Twoing Criterion
The Gini index may encounter problems when the domain of the target
attribute is relatively wide [ Breiman et al . (1984) ] .Insuchcases,itis
possible to employ binary criterion called twoing criterion. This criterion is
defined as:
twoing ( a i ,dom 1 ( a i ) ,dom 2 ( a i ) ,S )
σ a i ∈dom 1 ( a i ) S
|
σ a i ∈dom 2 ( a i ) S
|
=0 . 25
·
·
·
S
|
S
|
σ a i ∈dom 1 ( a i ) AND y = c i S
σ a i ∈dom 2 ( a i ) AND y = c i S
2
σ a i ∈dom 1 ( a i ) S
σ a i ∈dom 2 ( a i ) S
c i ∈dom ( y )
(5.12)
When the target attribute is binary, the Gini and twoing criteria are
equivalent. For multi-class problems, the twoing criteria prefers attributes
with evenly divided splits.
Orthogonal Criterion
5.1.12
The ORT criterion was presented by [ Fayyad and Irani (1992) ] .Thisbinary
criterion is defined as:
ORT ( a i ,dom 1 ( a i ) ,dom 2 ( a i ) ,S )=1
cosθ ( P y, 1 ,P y, 2 ) ,
(5.13)
where θ ( P y, 1 , P y, 2 ) is the angle between two vectors P y, 1 and P y, 2 .These
vectors represent the probability distribution of the target attribute in the
partitions σ a i ∈dom 1 ( a i ) S and σ a i ∈dom 2 ( a i ) S , respectively.
It has been shown that this criterion performs better than the
information gain and the Gini index for specific constellations of problems.
Search WWH ::




Custom Search