Database Reference
In-Depth Information
members. The idea is to simply give each classifier a different projection
of the training set. [Tumer and Oza]. Feature subset-based ensembles
potentially facilitate the creation of a classifier for high dimensionality data
sets without the feature selection drawbacks mentioned above. Moreover,
these methods can be used to improve the classification performance due to
the reduced correlation among the classifiers. Rokach and Maimon (2001)
also indicate that the reduced size of the dataset implies faster induction
of classifiers. Feature subset avoids the class under-representation which
may happen in instance subsets methods such as bagging. There are three
popular strategies for creating feature subset-based ensembles: random-
based, reduct-based and collective-performance-based strategy.
9.5.4.2.1 Random-based Strategy
The most straightforward techniques for creating feature subset-based
ensemble are based on random selection. Ho (1998) creates a forest of
decision trees. The ensemble is constructed systematically by pseudo-
randomly selecting subsets of features. The training instances are projected
to each subset and a decision tree is constructed using the projected training
samples. The process is repeated several times to create the forest. The
classifications of the individual trees are combined by averaging the condi-
tional probability of each class at the leaves (distribution summation). Ho
shows that simple random selection of feature subsets may be an effective
technique because the diversity of the ensemble members compensates for
their lack of accuracy.
Bay (1999) proposed using simple voting in order to combine outputs
from multiple KNN (K-Nearest Neighbor) classifiers, each having access
only to a random subset of the original features. Each classifier employs
the same number of features. A technique for building ensembles of simple
Bayesian classifiers in random feature subsets was also examined [ Tsymbal
and Puuronen (2002) ] for improving medical applications.
9.5.4.2.2
Reduct-based Strategy
A reduct is defined as the smallest feature subset which has the same
predictive power as the whole feature set. By definition, the size of the
ensembles that were created using reducts are limited to the number of
features. There have been several attempts to create classifier ensembles by
combining several reducts.
Search WWH ::




Custom Search