Database Reference
In-Depth Information
Level 3 — A majority vote will not always correctly classify a given
instance, but at least one ensemble member always correctly classifies it.
Level 4 — The function is not always covered by the members of the
ensemble.
Brown et al . (2005) claim that the above four-level scheme provides
no indication of how typical the error behavior described by the assigned
diversity level is. This claim, especially, holds when the ensemble exhibits
different diversity levels on different subsets of instance space.
There are other more quantative measures which categorize these
measures into two types [ Brown et al . (2005) ] : pairwise and non-pairwise.
Pairwise measures calculate the average of a particular distance metric
between all possible pairings of members in the ensemble, such as Q-statistic
[ Brown et al . (2005) ] or kappa-statistic [ Margineantu and Dietterich
(1997) ] . The non-pairwise measures either use the idea of entropy (such
as [ Cunningham and Carney (2000) ] ) or calculate a correlation of each
ensemble member with the averaged output. The comparison of several
measures of diversity has resulted in the conclusion that most of them are
correlated [Kuncheva and Whitaker (2003)].
9.6 Ensemble Size
Selecting the Ensemble Size
9.6.1
An important aspect of ensemble methods is to define how many component
classifiers should be used. There are several factors that may determine this
size:
Desired accuracy — In most cases, ensembles containing 10 classifiers
are sucient for reducing the error rate [Hansen and Salamon (1990)].
Nevertheless, there is empirical evidence indicating that: when AdaBoost
uses decision trees, error reduction is observed in even relatively large
ensembles containing 25 classifiers [ Opitz and Maclin (1999) ] . In disjoint
partitioning approaches, there may be a trade-off between the number
of subsets and the final accuracy. The size of each subset cannot be too
small because sucient data must be available for each learning process
to produce an effective classifier.
Computational cost — Increasing the number of classifiers usually
increases computational cost and decreases their comprehensibility. For
that reason, users may set their preferences by predefining the ensemble
size limit.
Search WWH ::




Custom Search