Information Technology Reference
In-Depth Information
(W3) w ( X ik ) = w ( X il ) if X ik ~ X il
(W4) Let AF be a set of features where
case retrieval and indexing, using, for example,
M-Trees (Ciaccia, Patella, & Zezula, 1997). (D4)
states that irrelevant features should not have an
influence on the distance. Finally, (D5) states
that adding alternative features should not have
an influence on distance.
X ik AF : ( X ik IF i ∨ ∃ X il X B : X ik ~ X il ).
Then
negative results
X il X B : ¬∃ X ik AF : X il ~ X ik ∧ w'( X il ) =
w ( X il )
In this section we will show that many feature
weighting approaches do not fulfill the conditions
(W1)-(W4). Furthermore, one of most popular
distance measures, the Euclidian distance, cannot
be used as a learning task distance measure.
where w is a weighting function for X B = X B
AF.
These conditions state that irrelevant features
have weight 0 and that the sum of weights of
alternative features must be constant indepen-
dently of the actual number of alternative features
used. Together with the last conditions it will be
guaranteed that a set of alternative features is
not more important than a single feature. In the
following we assume that for a modified space
of base features X B the function w ′ denotes the
weighting function for X B according to the defi-
nition in (W4).
Additionally, we can define a set of condi-
tions which must be met by distance measures
for learning tasks which are based on feature
weights only:
Lemma 1 Any feature selection method does
not fulfill the conditions (W1)-(W4).
Proof: For a feature selection method, weights
are always binary, that is, w ( X ik ) ∈ {0,1}. We as-
sume a learning task t i with no alternative features
and X B = C B { X ik } with X il X B : X il ~ X ik ,
then either w ′( X il ) = w ′( X ik ) = w ( X il ) = 1, leading
to a contradiction with (W2), or w ′( X il )≠ w ′( X ik )
leading to a contradiction with (W3).
Lemma 2 Any feature weighting method for
which w ( X ik ) is calculated independently of X B \
X ik does not fulfill the conditions (W1)-(W4).
Distance Conditions 1 A distance measure
d for learning tasks is a mapping d : T × T
+ which should fulfill at least the following
conditions:
(D1) d ( t 1 ,t 2 ) = 0 t 1 = t 2
(D2) d ( t 1 ,t 2 ) = d ( t 2 ,t 1 )
(D3) d ( t 1 ,t 3 ) ≤ d ( t 1 ,t 2 ) + d ( t 2 ,t 3 )
(D4) d ( t 1 ,t 2 ) = d ( t 1′ ,t 2′ ) if X B = X B IF and IF
IF 1 IF 2
(D5) d ( t 1 ,t 2 ) = d ( t 1′ ,t 2′ ) if X B = X B AF and X k
AF : X l X B : X k ~ X l
Proof: We assume a learning task t i with no
alternative features and X B = X B { X ik } with X il
X B : X il ~ X ik . If w is independent of X B \ Xik adding
X ik would not change the weight w ′( X il ) in the new
feature space X B . From (W3) follows that w ′( X ik )
= w ′( X il ) = w ( X il ) which is a violation of (W2).
Lemma 2 essentially covers all feature weight-
ing methods that treat features independently.
Important examples for such weighting schemes
are information gain (Quinlan, 1986) or Relief
(Kira & Rendell, 1992).
The next theorem states that the Euclidean
distance cannot be used as a distance measure
based on feature weights.
(D1)-(D3) represent the conditions for a met-
ric. These conditions are required for efficient
Search WWH ::




Custom Search