Feature Selection - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

data is used as the predicted class; otherwise, the majority class of all

matching instances is used. In such cases, the minimum description length

(MDL) principle guides the search by estimating the cost of encoding a

decision table and the training examples it misclassifies with respect to a

given feature subset. The features in the final decision table are then used

with other learning algorithms.

13.3.1.4 An Information Theoretic Feature Filter

There are many filters techniques that are based on information theory and

probabilistic reasoning [Koller and Sahami (1996)]. The rationale behind

this approach is that, since the goal of an induction algorithm is to estimate

the probability distributions over the class values, given the original feature

set, feature subset selection should attempt to remain as close to these

original distributions as possible.

13.3.1.5 RELIEF Algorithm

RELIEF [ Kira and Rendell (1992) ] uses instance-based learning to assign

a relevance weight to each feature. The weight for each feature reflects its

ability to single out the class values. The features are ranked by its weights

and chosen by using a user-specified threshold. RELIEF randomly chooses

instances from the training data. For every instance, RELIEF samples the

nearest instance of the same class (nearest hit) and finds the opposite class

(nearest miss). The weight for each feature is updated according to how

well its values differentiate the sampled instance from its nearest hit and

nearest miss. A feature will gain a high weight if it differentiates between

instances from different classes and has the same value for instances of the

same class.

13.3.1.6 Simba and G-flip

The SIMBA (iterative search margin-based algorithm) technique introduces

the idea of measuring the quality of a set of features by the margin it

induces. To overcome the drawback of iterative search, a greedy feature

flip algorithm G-flip is used [ Gilad-Bachrach et al . (2004) ] for maximizing

the margin function of a subset. The algorithm constantly iterates over the

feature set and updates the set of chosen features. During each iteration,

G-flip decides to eliminate or include the current feature to the selected

subset by evaluating the margin with and without this feature.

Search WWH ::

Custom Search

Home