Database Reference
In-Depth Information
Recall from Section 5.1.3:
|
σ a i = v i,k S
|
InformationGain( a i ,a j ,S )= Entropy ( a j ,S )
|
S
|
v i,k ∈dom ( a i )
·
Entropy ( a j a i = v i,k S ) ,
(13.25)
Entropy ( a i ,S )=
v i,k ∈dom ( a i )
|
σ a i = v i,k S
|
log 2 |
σ a i = v i,k S
|
·
.
|
S
|
|
S
|
(13.26)
Symmetrical uncertainty is used (rather than simple gain ratio) because
it is a symmetric measure and can therefore be used to measure feature-
feature correlations where there is no notion of one attribute being the
“class” as such.
As for the search organization the following methods can be used:
Best First Search; Forward Selection Search; Gain Ratio; Chi-Square; OneR
classifier; and Information Gain.
Beside the CFS, other evaluation methods can be considered including,
consistency subset evaluator and the wrapper subset evaluator with simple
classifiers (K-nearest neighbors, logistic regression and na ıve bayes).
13.5.3.2 Bagging
The most well-known independent method is bagging (bootstrap aggregat-
ing). In this case, each feature selector is executed on a sample of instances
taken with replacement from the training set. Usually, each sample size
is equal to the size of the original training set. Note that since sampling
with a replacement is used, some of the instances may appear more than
once in the same sample and some may not be included at all. Although
the training samples are different from each other, they are certainly not
independent from a statistical point of view.
13.6 Using Decision Trees for Feature Selection
Using decision trees for feature selection has one important advantage
known as “anytime”. However, for highly dimensional datasets, the feature
selection process becomes computationally intensive.
Decision trees can be used to implement a trade-off between the
performance of the selected features and the computation time which
Search WWH ::




Custom Search