Database Reference
In-Depth Information
The nature of the classification problem — In some ensemble methods,
the nature of the classification problem that is to be solved, determines
the number of classifiers. For instance, in the ECOC algorithm the
number of classes determine the ensemble size.
Number of processors available — In independent methods, the number
of processors available for parallel learning could be put as an upper
bound on the number of classifiers that are treated in paralleled process.
There three methods that are used to determine the ensemble size, as
described by the following subsections.
Pre-selection of the Ensemble Size
9.6.2
This is the most simple way to determine the ensemble size. Many ensemble
algorithms have a controlling parameter such as “number of iterations”,
which can be set by the user. Algorithms such as Bagging belong to this
category. In other cases, the nature of the classification problem determine
the number of members (such as in the case of ECOC).
Selection of the Ensemble Size while Training
9.6.3
There are ensemble algorithms that try to determine the best ensemble size
while training. Usually as new classifiers are added to the ensemble these
algorithms check if the contribution of the last classifier to the ensemble
performance is still significant. If it is not, the ensemble algorithm stops.
Usually these algorithms also have a controlling parameter which bounds
the maximum size of the ensemble.
An algorithm that decides when a sucient number of classification
trees have been created was recently proposed [ Banfield et al . (2007) ] .The
algorithm uses the out-of-bag error estimate, and is shown to result in an
accurate ensemble for those methods that incorporate bagging into the
construction of the ensemble. Specifically, the algorithm works by first
smoothing the out-of-bag error graph with a sliding window in order to
reduce the variance. After the smoothing has been completed, the algorithm
takes a larger window on the smoothed data points and determines the
maximum accuracy within that window. It continues to process windows
until the maximum accuracy within a particular window no longer increases.
At this point, the stopping criterion has been reached and the algorithm
returns the ensemble with the maximum raw accuracy from within that
window.
Search WWH ::




Custom Search