Information Technology Reference
In-Depth Information
SEE is not a node impurity function. Whereas a node impurity function
depends solely on the P ( ω i |
u ), SEE depends on the “error” of a specific split
involving parent and children nodes. Moreover, whereas node impurity func-
tions have a c -dimensional support for any arbitrary c ,SEEalwayshasa
3-point support.
Binary splitting criteria are based on partitions of feature domains into
two sub-domains. The classic binary criterion is the “towing” splitting rule.
Corresponds to maximizing TWO ( u )= P ( u r |
u )[ ω i ∈Ω |
u ) P ( u l |
P ( ω i |
u r )
] 2 , measuring how far apart are the numbers of cases of each class
assigned to each of the two subdomains, u r and u l .
P ( ω i |
u l )
|
4.1.4.2
Comparing IG, TWO, PE, and SEE
The comparative study of the four splitting criteria requires a common frame-
work expressing them all uniformly. We start by expressing all criteria in terms
of a minimization problem (one could of course express them all as a maximiza-
tion problem). For the impurity criterion IG ( u ) (and GI ( u )),since φ ( u ) doesn't
depend on the rule, we minimize φ z ( u ) instead of maximizing Δ z φ ( u );forthe
towing criterion we minimize max( TWO ( u ))
TWO ( u ). For simplicity reasons
we keep the names IG ( u ) and TWO ( u ) for the min-expressed rules.
n 1 ,n 0
Children
ω 1
n 1 r
n 1 l
n 1
a> Δ?
No
Yes
ω 0
n 0 r
n 0 l
n 0
n 1 l ,n 0 l
n 1 r ,n 0 r
n r
n l
Fig. 4.11
Two-class framework.
ω 1 , and the splitting of a node
with n 1 instances from class 1 and n 0 from class 0 as shown in Fig. 4.11.
At the parent node, with n = n 1 + n 0 instances, we have: P ( ω 1 )= n 1 /n =
p ; P ( ω 0 )= n 0 /n =1
Next, we consider two classes, ω 1 and ω 0
p = q .Moreover,foranyprior p and whatever
criterion being used, we consider that a certain percentage α of ω 1 instances
are assigned to the right node and a certain percentage β of ω 0 instances are
assigned to the left node: n 1 r = αn 1 ; n 0 l = βn 0 . In other words, assuming
that the labels assigned to the right and left nodes are respectively ω 1 and
ω 0 , α and β represent the percentages of correctly classified instances. With
these provisions we achieve what we were looking for: all splitting rules are
expressed in terms of the minimization of a function depending on the same
parameters, p , α and β (the least possible number of parameters), as follows:
 
Search WWH ::




Custom Search