Decision Forests - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

9.5.4.2.3

Collective-Performance-based Strategy

Cunningham and Carney (2000) introduced an ensemble feature selection

strategy that randomly constructs the initial ensemble. Then, an iterative

refinement is performed based on a hill-climbing search in order to improve

the accuracy and diversity of the base classifiers. For all the feature subsets,

an attempt is made to switch (include or delete) each feature. If the resulting

feature subset produces a better performance on the validation set, that

change is kept. This process is continued until no further improvements are

obtained. Similarly, Zenobi and Cunningham (2001) suggest that the search

for the different feature subsets will not be solely guided by the associated

error but also by the disagreement or ambiguity among the ensemble

members.

Tsymbal et al. (2004) compare several feature selection methods that

incorporate diversity as a component of the fitness function in the search for

the best collection of feature subsets. This study shows that there are some

datasets in which the ensemble feature selection method can be sensitive

to the choice of the diversity measure. Moreover, no particular measure is

superior in all cases.

Gunter and Bunke (2004) suggest employing a feature subset search

algorithm in order to find different subsets of the given features. The feature

subset search algorithm not only takes the performance of the ensemble into

account, but also directly supports diversity of subsets of features.

Combining genetic search with ensemble feature selection was also

examined in the literature. Opitz and Shavlik (1996) applied GAs to

ensembles using genetic operators that were designed explicitly for hidden

nodes in knowledge-based neural networks. In a later research, Opitz (1999)

used genetic search for ensemble feature selection. This genetic ensemble

feature selection (GEFS) strategy begins by creating an initial population of

classifiers where each classifier is generated by randomly selecting a different

subset of features. Then, new candidate classifiers are continually produced

by using the genetic operators of crossover and mutation on the feature

subsets. The final ensemble is composed of the most fitted classifiers.

9.5.4.2.4

Feature Set Partitioning

Feature set partitioning is a particular case of feature subset-based ensem-

bles in which the subsets are pairwise disjoint subsets. At the same time,

feature set partitioning generalizes the task of feature selection which aims

to provide a single representative set of features from which a classifier is

Search WWH ::

Custom Search

Home