Information Technology Reference
In-Depth Information
SEE is
not
a node impurity function. Whereas a node impurity function
depends solely on the
P
(
ω
i
|
u
), SEE depends on the “error” of a specific split
involving parent and children nodes. Moreover, whereas node impurity func-
tions have a
c
-dimensional support for any arbitrary
c
,SEEalwayshasa
3-point support.
Binary splitting criteria are based on partitions of feature domains into
two sub-domains. The classic binary criterion is the “towing” splitting rule.
Corresponds to maximizing
TWO
(
u
)=
P
(
u
r
|
u
)[
ω
i
∈Ω
|
u
)
P
(
u
l
|
P
(
ω
i
|
u
r
)
−
]
2
, measuring how far apart are the numbers of cases of each class
assigned to each of the two subdomains,
u
r
and
u
l
.
P
(
ω
i
|
u
l
)
|
4.1.4.2
Comparing IG, TWO, PE, and SEE
The comparative study of the four splitting criteria requires a common frame-
work expressing them all uniformly. We start by expressing all criteria in terms
of a minimization problem (one could of course express them all as a maximiza-
tion problem). For the impurity criterion
IG
(
u
) (and
GI
(
u
)),since
φ
(
u
) doesn't
depend on the rule, we minimize
φ
z
(
u
) instead of maximizing
Δ
z
φ
(
u
);forthe
towing criterion we minimize max(
TWO
(
u
))
TWO
(
u
). For simplicity reasons
we keep the names
IG
(
u
) and
TWO
(
u
) for the min-expressed rules.
−
n
1
,n
0
Children
ω
1
n
1
r
n
1
l
n
1
a>
Δ?
No
Yes
ω
0
n
0
r
n
0
l
n
0
n
1
l
,n
0
l
n
1
r
,n
0
r
n
r
n
l
Fig. 4.11
Two-class framework.
ω
1
, and the splitting of a node
with
n
1
instances from class 1 and
n
0
from class 0 as shown in Fig. 4.11.
At the parent node, with
n
=
n
1
+
n
0
instances, we have:
P
(
ω
1
)=
n
1
/n
=
p
;
P
(
ω
0
)=
n
0
/n
=1
Next, we consider two classes,
ω
1
and
ω
0
≡
p
=
q
.Moreover,foranyprior
p
and whatever
criterion being used, we consider that a certain percentage
α
of
ω
1
instances
are assigned to the right node and a certain percentage
β
of
ω
0
instances are
assigned to the left node:
n
1
r
=
αn
1
;
n
0
l
=
βn
0
. In other words, assuming
that the labels assigned to the right and left nodes are respectively
ω
1
and
ω
0
,
α
and
β
represent the percentages of correctly classified instances. With
these provisions we achieve what we were looking for: all splitting rules are
expressed in terms of the minimization of a function depending on the same
parameters,
p
,
α
and
β
(the least possible number of parameters), as follows:
−