Database Reference
In-Depth Information
learning is to encourage different individual classifiers in the ensemble to
represent different sub-spaces of the problem. While simultaneously creating
the classifiers, the classifiers may interact with each other in order to
specialize (for instance by using a correlation penalty term in the error
function to encourage such specialization).
9.5.2
Manipulating the Training Samples
In this method, each classifier is trained on a different variation or subset of
the original dataset. This method is useful for inducers whose variance-error
factor is relatively large (such as decision trees and neural networks). That
is to say, small changes in the training set may cause a major change in
the obtained classifier. This category contains procedures such as bagging,
boosting and cross-validated committees.
9.5.2.1 Resampling
The distribution of tuples among the different classifier could be random as
in the bagging algorithm or in the arbiter trees. Other methods distribute
the tuples based on the class distribution such that the class distribution in
each subset is approximately the same as that in the entire dataset. It has
been shown that proportional distribution as used in combiner trees [Chan
and Stolfo (1993)] can achieve higher accuracy than random distribution.
Instead of performing sampling with replacement, some methods (like
AdaBoost or Wagging) manipulate the weights that are attached to each
instance in the training set. The base inducer should be capable to take
these weights into account. Recently, a novel framework was proposed
in which each instance contributes to the committee formation with a
fixed weight, while contributing with different individual weights to the
derivation of the different constituent classifiers [ Christensen et al . (2004) ] .
This approach encourages model diversity without biasing the ensemble
inadvertently towards any particular instance.
9.5.2.2 Creation
The DECORATE algorithm [ Melville and Mooney (2003) ] is a dependent
approach in which the ensemble is generated iteratively, learning a classifier
at each iteration and adding it to the current ensemble. The first member
is created by using the base induction algorithm on the original training
set. The successive classifiers are trained on an artificial set that combines
Search WWH ::




Custom Search