Information Technology Reference
In-Depth Information
P ( ω i ,r )= n r
n 11 ( ω i ,r )
n
P 01 ( ω i )= P ( r )
n
.
(6.63)
Let us consider the class union ω . For both the theoretical and empirical
PMF's the following holds:
P 10 ( ω )= P ( ω, r )= P ( ω 1 , r )+ P ( ω 2 , r )= P 10 ( ω 1 )+ P 10 ( ω 2 );
(6.64)
[ P ( ω 1 ,r )+ P ( ω 2 ,r )] . (6.65)
We now see why we need to compute n 11 ( ω i ,r ) instead of n 01 ( ω i ,r ).Itis
simply a consequence of formula (6.65). We wouldn't be able to come out
with a similar expression in terms of n 01 ( ω i ,r ). We also see that it is a
negligible time-consuming task to compute the probabilities of interest for
the union of classes, because n r is computed only once for each feature and
is independent of the candidate class, and all that remains to be done is the
two additions and one subtraction of formulas (6.64) and (6.65).
P 01 ( ω )= P ( ω, r )= P ( r )
P ( ω, r )= P ( r )
6.6.2 Application to Real-World Datasets
The performance of MEE trees applied to real-world datasets was analyzed
in [152, 153]. In both works the results of MEE trees were compared against
those obtained with classic tree algorithms: the CART algorithm [33] using
the classic splitting rules discussed in Sect. 4.1.4 (CART-GI, CART-IG, and
CART-TWO, respectively for the Gini index, the information gain, and the
Twoing criterion); the C4.5 algorithm [177]. These algorithms are still the
ones in current extensive use and in available software tools of decision tree
design.
All algorithms were run with unit misclassification costs (i.e., the error
rates are weighted equally for all classes), estimated priors and the same
minimum number of instances for a node to be split: 5. The CART and MEE
algorithms were run with Cost-Complexity Pruning (CCP) with the 'min'
criterion and 10-fold cross-validation [33]. The C4.5 algorithm was run with
Pessimistic Error Pruning (PEP) at 25% confidence level [68]. References
[68, 34] report a good performance of CCP-min over other pruning methods,
including PEP, which has a tendency to underprune. CCP was also found to
appropriately limit tree growth more frequently than other pruning methods
[116].
Figure 6.39 shows the CCP-pruned MEE tree solution for the Glass dataset
[13]; tree construction involved the consideration of class unions up to three
classes (the Glass dataset has six classes).
The results reported in [152] respect to 36 public real-world datasets, quite
diverse in terms of number of instances, features and classes as well as of
 
Search WWH ::




Custom Search