Information Technology Reference
In-Depth Information
Tree design approaches are aimed at finding “optimal” solutions: minimum
sized trees with high classification accuracy [12, 130, 177]. The greedy ap-
proach is followed because a search on the whole set of possible trees for a
given problem is almost always impractical (finding optimal binary trees is
NP-complete, [108]). There is, however, a shortcoming with the greedy de-
signs [130]; whereas node performance is a linear function of the class recogni-
tion rates, the overall tree performance is usually highly nonlinear. Since the
greedy design is not guaranteed to be “optimal” in any sense, the accuracy, ef-
ficiency and generalization capability of the final solution relies on using some
sort of tree pruning — i.e., removing subtrees with no statistically significant
contribution to the final performance —, remedying the usual over-fitting of
the final solution. Even with pruning, final tree solutions are often found to
be more complex than they should [116].
The MEE trees we are about to describe are constructed relying on the
(discrete) Shannon MEE data splitter, studied in Chap. 4, for the search of
best splits. Compared to classic data splitting rules (presented in Sect. 4.1.4)
the MEE approach has an important advantage: one is able to devise a stop-
ping rule based on an implict distribution overlapping measure, such as the
interval-end hit rate (Sect. 4.1.3.2). In other words, by using MEE splitters
one is implicitly contributing to a better generalization, making the tree de-
sign less dependent on pruning.
6.6.1 The MEE Tree Algorithm
The greedy algorithm for growing a MEE tree has the following main steps
[152]:
1. Starting at the root node (where the whole training set is used), at each
tree node one has an n
c
( n cases, c classes) matrix T , filled with zeros and ones. A univariate split
z (with parameter Δ or B ) minimizing EE is searched for in the d
×
d ( n cases, d features) matrix X and an n
×
×
c -
dimensional space.
2. For that purpose, the error rates P 10 = n 10 /n , P 01 = n 01 /n ( n tt :number
of class t cases classified as t ) are computed for each candidate class label
t
is computationally easier to use.)
3. Theruleatnode u j — therefore, the triple
∈{
0 , 1
}
. ( T =
{
0 , 1
}
—cor-
responding to the smallest empirical error entropy is assigned to the node
and if a stopping criterion is satisfied the node becomes a leaf. Otherwise,
theleftandrightnode( u jl ,u jr ) sets are generated and steps 1 and 2
iterated.
{
x ·j ,t ·k j or B j }
The error rates at step 2 are computed for distinct values of every feature
x ·j ,j =1 ,...,d serving as candidate split points, Δ j for numerical-type rules
(often the middle points of the original training set values are used), and B j for
combinations of categorical values in the case of categorical variables. A tree
 
Search WWH ::




Custom Search