Database Reference
In-Depth Information
applications only if the original domain size of the input feature can be
decreased dramatically.
On the one hand, feature selection can be used as a preprocessing step
before building a decision tree. On the other hand, the decision tree can be
used as a feature selector for other induction methods.
At first glance, it seems redundant to use feature selection as a
preprocess phase for the training phase. Decision trees inducers, as opposed
to other induction methods, incorporate in their training phase a built-in
feature selection mechanism. Indeed, all criteria described in Section 5.1
are criteria for feature selection.
Still, it is well known that correlated and irrelevant features may
degrade the performance of decision trees inducers. This phenomenon
can be explained by the fact that feature selection in decision trees is
performed on one attribute at a time and only at the root node over
the entire decision space. In subsequent nodes, the training set is divided
into several sub-sets and the features are selected according to their
local predictive power [ Perner (2001) ] . Geometrically, it means that the
selection of features is done in orthogonal decision sub-spaces, which do
not necessarily represent the distribution of the entire instance space. It
has been shown that the predictive performance of decision trees could be
improved with an appropriate feature pre-selection phase. Moreover, using
feature selection can reduce the number of nodes in the tree making it more
compact.
Formally, the problem of feature subset selection can be defined as
follows [ Jain et al . (1997) ] :Let A be the original set of features, with
cardinality n .Let d represent the desired number of features in the selected
subset B , B
A . Let the feature selection criterion function for the set B
be represented by J ( B ). Without any loss of generality, a lower value of J
is considered to be a better feature subset (for instance, if J represents the
generalization error). The problem of feature selection is to find an optimal
subset B that solves the following optimization problem:
min J ( Z )
s.t.
Z
(13.1)
A
|
Z
|
= d.
n !
d ! · ( n−d )!
An exhaustive search would require examining all
possible
d -subsets of the feature set A .
Search WWH ::




Custom Search