Database Reference
In-Depth Information
is required to find a subset. Top-down inducers of decision trees can
be considered as anytime algorithms for feature selection, because they
gradually improve the performance and can be stopped at any time and
provide sub-optimal feature subsets.
Decision trees have been used as an evaluation means for directing the
feature selection search. For instance, a hybrid learning methodology that
integrates genetic algorithms (GAs) and decision tree inducers in order to
find the best feature subset was proposed in [ Bala et al . (1995) ] .AGAis
used to search the space of all possible subsets of a large set of candidate
discrimination features. In order to evaluate a certain feature subset, a
decision tree is trained and its accuracy is used as a measure of fitness for
the given feature set, which, in turn, is used by the GA to evolve better
feature sets.
13.7 Limitation of Feature Selection Methods
Despite its popularity, the usage of feature selection methodologies for
overcoming the obstacles of high dimensionality has several drawbacks:
The assumption that a large set of input features can be reduced to a
small subset of relevant features is not always true; in some cases the
target feature is actually affected by most of the input features, and
removing features will cause a significant loss of important information.
The outcome (i.e. the subset) of many algorithms for feature selection (for
example, almost any of the algorithms that are based upon the wrapper
methodology) is strongly dependent on the training set size. That is, if
the training set is small, then the size of the reduced subset will also
be small. Consequently, relevant features might be lost. Accordingly, the
induced classifiers might achieve lower accuracy compared to classifiers
that have access to all relevant features.
In some cases, even after eliminating a set of irrelevant features, the
researcher is left with relatively large numbers of relevant features.
The backward elimination strategy, used by some methods, is extremely
inecient for working with large-scale databases, where the number of
original features is more than 100.
One way to deal with the above-mentioned disadvantages is to use a
very large training set (which should increase in an exponential manner
as the number of input features increases). However, the researcher rarely
enjoys this privilege, and even if it does happen, the researcher will probably
Search WWH ::




Custom Search