Feature Selection - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

is required to find a subset. Top-down inducers of decision trees can

be considered as anytime algorithms for feature selection, because they

gradually improve the performance and can be stopped at any time and

provide sub-optimal feature subsets.

Decision trees have been used as an evaluation means for directing the

feature selection search. For instance, a hybrid learning methodology that

integrates genetic algorithms (GAs) and decision tree inducers in order to

find the best feature subset was proposed in [ Bala et al . (1995) ] .AGAis

used to search the space of all possible subsets of a large set of candidate

discrimination features. In order to evaluate a certain feature subset, a

decision tree is trained and its accuracy is used as a measure of fitness for

the given feature set, which, in turn, is used by the GA to evolve better

feature sets.

13.7 Limitation of Feature Selection Methods

Despite its popularity, the usage of feature selection methodologies for

overcoming the obstacles of high dimensionality has several drawbacks:

•

The assumption that a large set of input features can be reduced to a

small subset of relevant features is not always true; in some cases the

target feature is actually affected by most of the input features, and

removing features will cause a significant loss of important information.

•

The outcome (i.e. the subset) of many algorithms for feature selection (for

example, almost any of the algorithms that are based upon the wrapper

methodology) is strongly dependent on the training set size. That is, if

the training set is small, then the size of the reduced subset will also

be small. Consequently, relevant features might be lost. Accordingly, the

induced classifiers might achieve lower accuracy compared to classifiers

that have access to all relevant features.

•

In some cases, even after eliminating a set of irrelevant features, the

researcher is left with relatively large numbers of relevant features.

•

The backward elimination strategy, used by some methods, is extremely

inecient for working with large-scale databases, where the number of

original features is more than 100.

One way to deal with the above-mentioned disadvantages is to use a

very large training set (which should increase in an exponential manner

as the number of input features increases). However, the researcher rarely

enjoys this privilege, and even if it does happen, the researcher will probably

Search WWH ::

Custom Search

Home