Decision Forests - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

There are other ways to obtain random forests. For example, instead of

using all the instances to determine the best split point for each feature, a

sub-sample of the instances is used [ Kamath and Cantu-Paz (2001) ] .This

sub-sample varies with the feature. The feature and split value that optimize

the splitting criterion are chosen as the decision at that node. Since the split

made at a node is likely to vary with the sample selected, this technique

results in different trees which can be combined in ensembles.

Another method for randomization of the decision tree through his-

tograms was proposed by Kamath et al . (2002). The use of histograms

has long been suggested as a way of making the features discrete, while

reducing the time to handle very large datasets. Typically, a histogram is

created for each feature, and the bin boundaries used as potential split

points. The randomization in this process is expressed by selecting the split

point randomly in an interval around the best bin boundary.

Although the random forest was defined for decision trees, this

approach is applicable to all types of classifiers. One important advantage

of the random forest method is its ability to handle a very large number of

input attributes [ Skurichina and Duin (2002) ] . Another important feature

of the random forest is that it is fast.

9.4.2.4 Rotation Forest

Similarly to Random Forest, the aim of Rotation Forest is to independently

build accurate and diverse set of classification trees. Recall that in Random

Forest the diversity among the base trees is obtained by training each tree

on a different bootstrap sample of the dataset and by randomizing the

feature choice at each node. On the other hand, in Rotation forest the

diversity among the base trees is acheived by training each tree on the whole

dataset in a rotated feature space. Because tree induction algorithms split

the input space using hyperplanes parallel to the feature axes, rotating the

axes just before running the tree induction algorithm, may result with a

very different classification tree.

More specifically the main idea is to use feature extraction methods

to build a full feature set for each tree in the forest. To this end, we first

randomly split the feature set into K mutually exclusive partitions. Then

we use a principal component analysis (PCA) separately on each feature

partition. PCA is a well-established statistical procedure that was invented

in 1901 by Karl Pearson. The idea of PCA is to orthogonaly transforms

possibly correlated features into a set of linearly uncorrelated features

(called principal components). Each component is a linear combination

Search WWH ::

Custom Search

Home