Popular Decision Trees Induction Algorithms - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

compares C4.5 with J48 and C5.0 [ Moore et al . (2009) ] indicates that C4.5

performs consistently better (in terms of accuracy) than C5.0 and J48 in

particular on small datasets.

7.4 CART

CART stands for Classification and Regression Trees. It was developed by

Breiman et al . (1984) and is characterized by the fact that it constructs

binary trees, namely each internal node has exactly two outgoing edges.

The splits are selected using the Twoing Criteria and the obtained tree is

pruned by Cost-Complexity Pruning. When provided, CART can consider

misclassification costs in the tree induction. It also enables users to provide

prior probability distribution.

An important feature of CART is its ability to generate regression trees.

In regression trees, the leaves predict a real number and not a class. In case

of regression, CART looks for splits that minimize the prediction squared

error (the least-squared deviation). The prediction in each leaf is based on

the weighted mean for node.

7.5 CHAID

Starting from the early Seventies, researchers in applied statistics developed

procedures for generating decision trees [Kass (1980)].Ch -squared-

Automatic-Interaction-Detection (CHIAD) was originally designed to han-

dle nominal attributes only. For each input attribute

a i , CHAID finds

the pair of values in

V i that is least significantly different with respect to

the target attribute. The significant difference is measured by the

value

obtained from a statistical test. The statistical test used depends on the type

of target attribute. An

p

test is used if the target attribute is continuous;

a Pearson chi-squared test if it is nominal; and a likelihood ratio test if it

is ordinal.

For each selected pair of values, CHAID checks if the

F

value obtained

is greater than a certain merge threshold. If the answer is positive, it merges

the values and searches for an additional potential pair to be merged. The

process is repeated until no significant pairs are found.

The best input attribute to be used for splitting the current node is

then selected, such that each child node is made of a group of homogeneous

values of the selected attribute. Note that no split is performed if the

adjusted

p

value of the best input attribute is not less than a certain split

Data Mining with Decision Trees: Theory and Applications

Search WWH ::

Custom Search

Home