Database Reference
In-Depth Information
13.3.1.7 Contextual Merit (CM) Algorithm
The CM algorithm [ Hong (1997) ] uses a merit function based upon weighted
distances between examples which takes into account complete feature
correlation's to the instance class. This approach assumes that features
should be weighted according to their discrimination power regarding
instances that are close to each other(basedontheEuclideandistance)
but which are associated with different classes. The CM approach has been
used to select features for decision trees and an experimental study shows
that feature subset selection can help to improve the prediction accuracy
of the induced classifier [ Perner (2001) ] .
The notation d r,s represents the distance between the value of feature
i in the instances r and s (i.e. the distance between x r,i and x s,i ). For
numerical attributes, the distance is min (1 , x r,i −x s,i
t i )where t i is usually 0 . 5
of the value range of the attribute i . For nominal attributes, the distance
is 0 if x r,i = x s,i , and 1 otherwise. The contextual merit for attribute
i is calculated as M i = r =1 s∈{ ( x,y ) ∈S|y i = y r } w r,s d r,s
where m is the
training set size,
is the set of instances associated with
a different class than the instance r ,and w r,s
{
( x, y )
S
|
y i
= y r }
is a weighting factor.
13.3.2
Using Traditional Statistics for Filtering
13.3.2.1 Mallows Cp
This method minimizes the mean square error of prediction [ Mallows
(1973) ] :
RSS γ
σ FULL
RSS γ
σ FULL
C p =
+2 q γ
nC p =
+2 q γ
n,
(13.2)
where, RSS γ is the residual sum of squares for the γ th model and σ ˆ FULL
is the usual unbiased estimate of σ 2 based on the full model.
The goal is to find the subset which has minimum Cp .
13.3.2.2 AIC, BIC and F-ratio
Akaike Information Criterion (AIC) and Bayesian Information Criterion
(BIC) are criteria for choosing a subset of features. Letting l γ denote the
maximum log likelihood of the γ th model, AIC selects the model which
maximizes ( l γ
q γ ) whereas BIC selects the model which maximizes ( l γ
(log n ) q γ 2).
Search WWH ::




Custom Search