Feature Selection - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

decisions than can be obtained from using a single model. Some of the

drawbacks of wrappers and filters can be solved by using an ensemble.

As mentioned above, filters perform less than wrappers. Due to the voting

process, noisy results are filtered. Secondly, the drawback of wrappers which

“cost” computing time is solved by operating a group of filters. The idea of

building a predictive model by integrating multiple models has been under

investigation for a long time.

Ensemble feature selection methods [ Opitz (1999) ] extend traditional

feature selection methods by looking for a set of feature subsets that

will promote disagreement among the base classifiers. Simple random

selection of feature subsets may be an effective technique for ensemble

feature selection because the lack of accuracy in the ensemble members

is compensated for by their diversity [Ho (1998)]. Tsymbal and Puuronen

(2002) presented a technique for building ensembles of simple Bayes

classifiers in random feature subsets.

The hill-climbing ensemble feature selection strategy randomly con-

structs the initial ensemble [ Cunningham and Carney (2000) ] . Then, an

iterative refinement is performed based on hill-climbing search in order to

improve the accuracy and diversity of the base classifiers. For all the feature

subsets, an attempt is made to switch (include or delete) each feature. If the

resulting feature subset produces better performance on the validation set,

that change is kept. This process is continued until no further improvements

are obtained.

Genetic Ensemble Feature Selection (GEFS) [ Opitz (1999) ] uses genetic

search for ensemble feature selection. This strategy begins with creating

an initial population of classifiers where each classifier is generated by

randomly selecting a different subset of features. Then, new candidate

classifiers are continually produced by using the genetic operators of

crossover and mutation on the feature subsets. The final ensemble is

composed of the most fitted classifiers.

Another method for creating a set of feature selection solutions using

a genetic algorithm was proposed by Oliveira et al . (2003). They create a

Pareto-optimal front in relation to two different objectives: accuracy on a

validation set and number of features. Following that, they select the best

feature selection solution.

In the statistics literature, the most well-known feature-oriented ensem-

ble algorithm is the MARS algorithm [ Friedman (1991) ] . In this algorithm,

a multiple regression function is approximated using linear splines and their

tensor products.

Search WWH ::

Custom Search

Home