Database Reference
In-Depth Information
data is used as the predicted class; otherwise, the majority class of all
matching instances is used. In such cases, the minimum description length
(MDL) principle guides the search by estimating the cost of encoding a
decision table and the training examples it misclassifies with respect to a
given feature subset. The features in the final decision table are then used
with other learning algorithms.
13.3.1.4 An Information Theoretic Feature Filter
There are many filters techniques that are based on information theory and
probabilistic reasoning [Koller and Sahami (1996)]. The rationale behind
this approach is that, since the goal of an induction algorithm is to estimate
the probability distributions over the class values, given the original feature
set, feature subset selection should attempt to remain as close to these
original distributions as possible.
13.3.1.5 RELIEF Algorithm
RELIEF [ Kira and Rendell (1992) ] uses instance-based learning to assign
a relevance weight to each feature. The weight for each feature reflects its
ability to single out the class values. The features are ranked by its weights
and chosen by using a user-specified threshold. RELIEF randomly chooses
instances from the training data. For every instance, RELIEF samples the
nearest instance of the same class (nearest hit) and finds the opposite class
(nearest miss). The weight for each feature is updated according to how
well its values differentiate the sampled instance from its nearest hit and
nearest miss. A feature will gain a high weight if it differentiates between
instances from different classes and has the same value for instances of the
same class.
13.3.1.6 Simba and G-flip
The SIMBA (iterative search margin-based algorithm) technique introduces
the idea of measuring the quality of a set of features by the margin it
induces. To overcome the drawback of iterative search, a greedy feature
flip algorithm G-flip is used [ Gilad-Bachrach et al . (2004) ] for maximizing
the margin function of a subset. The algorithm constantly iterates over the
feature set and updates the set of chosen features. During each iteration,
G-flip decides to eliminate or include the current feature to the selected
subset by evaluating the margin with and without this feature.
Search WWH ::




Custom Search