Graphics Reference
In-Depth Information
Embedded methods that integrate FS as part of the training process could be
more efficient in several respects: they could take advantage of the available data by
not requiring to split the training data into a training and validation set; they could
achieve a faster solution by avoiding the re-training of a predictor for each feature
subset explored. Embedded methods are not new: decision trees such as C.45, have
a built-in mechanism to carry out FS [ 43 ].
7.3 Aspects
This section is devoted to the discussion of several important aspects related to FS.
Each subsection will deal with one general facet of FS, but they are neither directly
connected or sorted following a certain criterion. For more advanced and more recent
developments of FS, please read Sect. 7.5 of this chapter.
7.3.1 Output of Feature Selection
From the point of view of the output of FS methods, they can be grouped into two
categories. The first one consists of ranking features according to some evaluation
criteria; the other consists of choosing a minimum set of features that satisfy an
evaluation criterion. Next, explain both of them in more detail.
7.3.1.1 Feature Ranking Techniques
In this category of methods, we expect as the output a ranked list of features which
are ordered according to evaluation measures. The measures can be of any type:
information, distance, dependence, consistency or accuracy. Thus, a feature selector
belonging to this family does not inform about the minimum subset of features;
instead, they return the relevance of the features.
The basic idea consists of evaluating each feature with a measure and attaching the
result values to each feature. Then, the features are sorted according to the values. The
run time complexity of this algorithm is O
M 2
, where M is the number of
features and N the number of instances. There are many variations of this algorithm
that draw different FS methods. The common property is the outcome based on a
ranked list of features. Algorithm 5 summarizes the operation of a univariate feature
ranking technique.
For performing actual FS, the simplest way is to choose the first m features for the
task at hand, whenever we know the most appropriate m value. But this is not always
true, there is not a straightforward procedure to obtain m . Solutions could proceed
from building DMmodels repeatedly until the generalization error is decreased. This
(
MN
+
)
 
Search WWH ::




Custom Search