Database Reference
In-Depth Information
first used by Gams (1989), employed 10-fold partitioning. Parmanto
et al . (1996) have also used this idea for creating an ensemble of neural
networks. Domingos (1996) used cross-validated committees to speed up
his own rule induction algorithm RISE, whose complexity is O ( n 2 ), making
it unsuitable for processing large databases. In this case, partitioning
is applied by predetermining a maximum number of examples to which
the algorithm can be applied at once. The full training set is randomly
divided into approximately equal-sized partitions. RISE is then run on each
partition separately. Each set of rules grown from the examples in partition
p is tested on the examples in partition p + 1, in order to reduce overfitting
and to improve accuracy.
9.5 Ensemble Diversity
In an ensemble, the combination of the output of several classifiers is only
useful if they disagree about some inputs [ Tumer and Ghosh (1996) ] .
Creating an ensemble in which each classifier is as different as possible
while still being consistent with the training set is theoretically known to be
an important feature for obtaining improved ensemble performance [ Krogh
and Vedelsby (1995) ] . According to Hu (2001), diversified classifiers lead to
uncorrelated errors, which in turn improve classification accuracy.
In the regression context, the bias-variance-covariance decomposition
has been suggested to explain why and how diversity between individual
models contribute toward overall ensemble accuracy. Nevertheless, in the
classification context, there is no complete and agreed upon theory [ Brown
et al . (2005) ] . More specifically, there is no simple analogue of variance-
covariance decomposition for the zero-one loss function. Instead, there
are several ways to define this decomposition. Each way has its own
assumptions.
Sharkey (1999) suggested a taxonomy of methods for creating diversity
in ensembles of neural networks. More specifically, Sharkey's taxonomy
refers to four different aspects: the initial weights; the training data used;
the architecture of the networks; and the training algorithm used.
Brown et al . (2005) suggest a different taxonomy which consists of the
following branches: varying the starting points within the hypothesis space;
varying the set of hypotheses that are accessible by the ensemble members
(for instance by manipulating the training set); and varying the way each
member traverses the space.
Search WWH ::




Custom Search