Decision Forests - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

first used by Gams (1989), employed 10-fold partitioning. Parmanto

et al . (1996) have also used this idea for creating an ensemble of neural

networks. Domingos (1996) used cross-validated committees to speed up

his own rule induction algorithm RISE, whose complexity is O ( n 2 ), making

it unsuitable for processing large databases. In this case, partitioning

is applied by predetermining a maximum number of examples to which

the algorithm can be applied at once. The full training set is randomly

divided into approximately equal-sized partitions. RISE is then run on each

partition separately. Each set of rules grown from the examples in partition

p is tested on the examples in partition p + 1, in order to reduce overfitting

and to improve accuracy.

9.5 Ensemble Diversity

In an ensemble, the combination of the output of several classifiers is only

useful if they disagree about some inputs [ Tumer and Ghosh (1996) ] .

Creating an ensemble in which each classifier is as different as possible

while still being consistent with the training set is theoretically known to be

an important feature for obtaining improved ensemble performance [ Krogh

and Vedelsby (1995) ] . According to Hu (2001), diversified classifiers lead to

uncorrelated errors, which in turn improve classification accuracy.

In the regression context, the bias-variance-covariance decomposition

has been suggested to explain why and how diversity between individual

models contribute toward overall ensemble accuracy. Nevertheless, in the

classification context, there is no complete and agreed upon theory [ Brown

et al . (2005) ] . More specifically, there is no simple analogue of variance-

covariance decomposition for the zero-one loss function. Instead, there

are several ways to define this decomposition. Each way has its own

assumptions.

Sharkey (1999) suggested a taxonomy of methods for creating diversity

in ensembles of neural networks. More specifically, Sharkey's taxonomy

refers to four different aspects: the initial weights; the training data used;

the architecture of the networks; and the training algorithm used.

Brown et al . (2005) suggest a different taxonomy which consists of the

following branches: varying the starting points within the hypothesis space;

varying the set of hypotheses that are accessible by the ensemble members

(for instance by manipulating the training set); and varying the way each

member traverses the space.

Search WWH ::

Custom Search

Home