Graphics Reference
In-Depth Information
must be mentioned in a FS topic review. Relief was proposed in [ 22 ] and selects
features that are statistically relevant. Although its goal is still selecting features, it
does not explicitly generate feature subsets and test them like the methods reviewed
above. Instead of generating feature subsets, Relief focuses on sampling instances
without an explicit search for feature subsets. This follows the idea that relevant
features are those whose values can distinguish among instances that are close to
each other. Hence, two nearest neighbors (belonging to different classes in a two-
class problem) are found for each given instance I , one is the so-called near-hit H
and the other is near-miss J . We expect a feature to be relevant if its values are
the same between I and H , and different between I and J . This checking can be
carried out in terms of some distance between feature's values, which should be
minimum for I and H and maximum for I and J . The distance of each feature for
each randomly chosen instance is accumulated in a weight vector w of the same
number of dimensions as the number of features. The relevant features are those
having their weights exceeding a relevance threshold
, which can be statistically
estimated. The parameter m is the sample size and larger m produces a more reliable
approximation. The algorithm is presented in Algorithm 11. It does not fit into any
of the categories described in the previous section, although it evaluates a feature
using distance measures.
τ
Algorithm 11 Relief algorithm.
function Relief( x - features, m - number of instances sampled,
τ
- relevance threshold)
initialize: w
=
0
for i
1to m do
randomly select an instance I
find nearest-hit H and nearest-miss J
for j = 1to M do
w ( j ) = w ( j ) dist ( j , I , H )
=
2
2
/ m + dist ( j , I , J )
/ m
dist is a distance function
end for
end for
return w greater than τ
end function
The main advantage of Relief is that it can handle discrete and continuous data, by
using distance measures which can work with categorical values. On the other hand,
its main weakness is that it is limited to two-class data, although some extensions
for multiple classes have been proposed, such as ReliefF [ 25 ].
7.5 Related and Advanced Topics
This section is devoted to highlighting some recent developments on FS and to
shortly discuss related paradigms such as feature extraction (Sect. 7.5.2 ) and fea-
ture construction (Sect. 7.5.3 ). It is noteworthy to mention that the current state of
 
 
Search WWH ::




Custom Search