Feature Selection - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

13.5.3

Feature Ensemble Generator

In order to make the ensemble more effective, there should be some sort of

diversity between the feature subsets. Diversity may be obtained through

different presentations of the input data or variations in feature selector

design. The following sections describe each one of the different approaches.

13.5.3.1 Multiple Feature Selectors

In this approach, we simply use a set of different feature selection

algorithms. The basic assumption is that since different algorithms have

different inductive biases, they will create different feature subsets.

The proposed method can be employed with the correlation-based

feature subset selection (CFS) as a subset evaluator [ Hall (1999) ] .CFS

evaluates the worth of a subset of attributes by considering the individual

predictive ability of each feature along with the degree of redundancy

between them. Subsets of features that are highly correlated with the class

while having low inter-correlation are preferred.

At the heart of the CFS algorithm is a heuristic for evaluating the

worth or merit of a subset of features. This heuristic takes into account

the usefulness of individual features for predicting the class label along

with the level of inter-correlation among them. The heuristic is based on

the following hypothesis: a good features subset contains features that are

highly correlated with the class, but which are uncorrelated with each other.

Equation (13.23) formalizes the feature selection heuristics:

kr c f

k + k ( k

M B =

,

(13.23)

−

1) r f f

where M B

is the heuristic “merit” of a feature subset B containing k

features; r c f

is the average feature-class correlation; and r f f

is the average

feature-feature correlation.

In order to apply Equation (13.23) to estimate the merit of a feature

subset, it is necessary to compute the correlation (dependence) between

attributes. For discrete class problems, CFS first discretises numeric

features then uses symmetrical uncertainty (a modified information gain

measure) to calculates feature-class and feature-feature correlations:

InformationGain( a i ,a j ,S )

Entropy ( a i ,S )+ Entropy ( a j ,S ) .

SU =

(13.24)

Data Mining with Decision Trees: Theory and Applications

Search WWH ::

Custom Search

Home