Information Technology Reference
In-Depth Information
node
u
j
, represents a binary test
z
j
as
Δ, z
j
(
x
i
)=
ω
k
;
ω
k
otherwise}
for
numerical
inputs (i.e., real-valued inputs) or as
{
x
ij
≤
B, z
j
(
x
i
)=
ω
k
;
ω
k
,
otherwise} for
categorical
inputs;
Δ
and
B
are, respectively, a numerical
threshold and a set of categories. The application of the binary test at a tree
node
u
sends the inputted data either to a left node
u
l
—if
x
ij
≤
{
x
ij
∈
Δ
or
x
ij
∈
B
. The left and right
nodes,
u
l
and
u
r
, are called children nodes of
u
(the parent node). Thus, the
application of successive binary tests sends a data instance traveling down
the tree until reaching some tree leaf, where a final class label is assigned.
Details on the construction of binary trees can be found in [33, 188]. In
the following we describe three classic univariate splits, often used in the
construction of binary decision trees, and compare them with SEE splits [152].
B
—; or to a right node
u
r
—if
x
ij
>Δ
or
x
ij
∈
4.1.4.1
Classic Splitting Criteria
The many univariate splits (also named splitting criteria or rules) proposed
until today fall into two main categories: impurity-based criteria and binary
criteria [188, 147]. All of them are defined in terms of the class-conditional
PMFs,
P
(
ω
i
|
u
) at a node
u
. In contrast, MEE is
not
defined in terms of the
P
(
ω
i
|
u
);MEEis
post-rule
defined, in terms of the
P
(
ω
i
|
r
),where
r
denotes
a splitting rule.
An impurity function of a node
u
,
φ
(
u
)
u
)),isa
symmetric function measuring the degree of “disorder” of its PMF in such a
way that it achieves its maximum at the uniform distribution, and the min-
imum at a Dirac-
delta
distribution. Classic impurity functions are the Gini
≡
φ
(
P
(
ω
1
|
u
)
,...,P
(
ω
c
|
diversity index,
GDI
(
u
)=
j
=1
k
=1
,k
=
j
P
(
ω
j
|
u
) and, of course,
the (discrete) Shannon entropy (also called “information” in decision tree
jargon)
H
(
u
)=
u
)
P
(
ω
k
|
−
k
=1
P
(
ω
k
|
u
).
Consider a node
u
, candidate to splitting into right and left nodes,
u
r
and
u
l
. The average impurity is
φ
z
(
u
)=
P
(
u
r
|
u
)ln
P
(
ω
k
|
u, z
)
φ
(
u
r
)+
P
(
u
l
|
u, z
)
φ
(
u
l
).One
is then interested in maximizing
Δ
z
φ
(
u
)=
φ
(
u
)
φ
z
(
u
) at each node. When
using
GDI
(
u
) and
H
(
u
) we obtain the Gini index (GI) and information gain
(IG) split criteria, respectively. Since
GDI
(
u
) is the first order approximation
of
H
(
u
), both GI and IG behave similarly, allowing us to solely focus on IG
in the sequel [35, 178].
The probability of error at
u
,
PE
(
u
)=1
−
u
),isalsoanim-
purity function, since it enjoys all the above properties. However, as shown
in [33], for tree design
PE
(
u
) has two serious drawbacks: it may lead to
ω
(
u
)=
ω
(
u
r
)=
ω
(
u
l
) for all candidate splits and in such cases no best split
can be found; it may lead to reject solutions with purer children nodes in
detriment of others with highly impure nodes, the reason for this shortcom-
ing being the non-strict concave behavior of
PE
(
·
) (
GDI
and
H
are strictly
concave functions).
−
max
j
P
(
ω
j
|