Information Technology Reference
In-Depth Information
node u j , represents a binary test z j as
Δ, z j ( x i )= ω k ; ω k otherwise}
for numerical inputs (i.e., real-valued inputs) or as
{
x ij
B, z j ( x i )= ω k ; ω k ,
otherwise} for categorical inputs; Δ and B are, respectively, a numerical
threshold and a set of categories. The application of the binary test at a tree
node u sends the inputted data either to a left node u l —if x ij
{
x ij
Δ or
x ij
B . The left and right
nodes, u l and u r , are called children nodes of u (the parent node). Thus, the
application of successive binary tests sends a data instance traveling down
the tree until reaching some tree leaf, where a final class label is assigned.
Details on the construction of binary trees can be found in [33, 188]. In
the following we describe three classic univariate splits, often used in the
construction of binary decision trees, and compare them with SEE splits [152].
B —; or to a right node u r —if x ij or x ij
4.1.4.1
Classic Splitting Criteria
The many univariate splits (also named splitting criteria or rules) proposed
until today fall into two main categories: impurity-based criteria and binary
criteria [188, 147]. All of them are defined in terms of the class-conditional
PMFs, P ( ω i |
u ) at a node u . In contrast, MEE is not defined in terms of the
P ( ω i |
u );MEEis post-rule defined, in terms of the P ( ω i |
r ),where r denotes
a splitting rule.
An impurity function of a node u , φ ( u )
u )),isa
symmetric function measuring the degree of “disorder” of its PMF in such a
way that it achieves its maximum at the uniform distribution, and the min-
imum at a Dirac- delta distribution. Classic impurity functions are the Gini
φ ( P ( ω 1 |
u ) ,...,P ( ω c |
diversity index, GDI ( u )= j =1 k =1 ,k = j P ( ω j |
u ) and, of course,
the (discrete) Shannon entropy (also called “information” in decision tree
jargon) H ( u )=
u ) P ( ω k |
k =1 P ( ω k |
u ).
Consider a node u , candidate to splitting into right and left nodes, u r and
u l . The average impurity is φ z ( u )= P ( u r |
u )ln P ( ω k |
u, z ) φ ( u r )+ P ( u l |
u, z ) φ ( u l ).One
is then interested in maximizing Δ z φ ( u )= φ ( u )
φ z ( u ) at each node. When
using GDI ( u ) and H ( u ) we obtain the Gini index (GI) and information gain
(IG) split criteria, respectively. Since GDI ( u ) is the first order approximation
of H ( u ), both GI and IG behave similarly, allowing us to solely focus on IG
in the sequel [35, 178].
The probability of error at u , PE ( u )=1
u ),isalsoanim-
purity function, since it enjoys all the above properties. However, as shown
in [33], for tree design PE ( u ) has two serious drawbacks: it may lead to
ω ( u )= ω ( u r )= ω ( u l ) for all candidate splits and in such cases no best split
can be found; it may lead to reject solutions with purer children nodes in
detriment of others with highly impure nodes, the reason for this shortcom-
ing being the non-strict concave behavior of PE ( · ) ( GDI and H are strictly
concave functions).
max j P ( ω j |
 
Search WWH ::




Custom Search