Information Technology Reference
In-Depth Information
(W3) w
(
X
ik
) =
w
(
X
il
)
if X
ik
~
X
il
(W4) Let AF be a set of features where
case retrieval and indexing, using, for example,
M-Trees (Ciaccia, Patella, & Zezula, 1997). (D4)
states that irrelevant features should not have an
influence on the distance. Finally, (D5) states
that adding alternative features should not have
an influence on distance.
∀
X
ik
∈
AF
: (
X
ik
∈
IF
i
∨ ∃
X
il
∈
X
B
:
X
ik
~
X
il
).
Then
negative results
∀
X
il
∈
X
B
: ¬∃
X
ik
∈
AF
:
X
il
~
X
ik
∧ w'(
X
il
) =
w
(
X
il
)
In this section we will show that many feature
weighting approaches do not fulfill the conditions
(W1)-(W4). Furthermore, one of most popular
distance measures, the Euclidian distance, cannot
be used as a learning task distance measure.
where w
′
is a weighting function for X
′
B
=
X
B
∪
AF.
These conditions state that irrelevant features
have weight 0 and that the sum of weights of
alternative features must be constant indepen-
dently of the actual number of alternative features
used. Together with the last conditions it will be
guaranteed that a set of alternative features is
not more important than a single feature. In the
following we assume that for a modified space
of base features
X
′
B
the function
w
′ denotes the
weighting function for
X
′
B
according to the defi-
nition in (W4).
Additionally, we can define a set of condi-
tions which must be met by distance measures
for learning tasks which are based on feature
weights only:
Lemma 1
Any feature selection method does
not fulfill the conditions
(W1)-(W4).
Proof:
For a feature selection method, weights
are always binary, that is,
w
(
X
ik
) ∈
{0,1}. We as-
sume a learning task
t
i
with no alternative features
and
X
′
B
=
C
B
∪
{
X
ik
} with
∃
X
il
∈
X
B
:
X
il
~
X
ik
,
then either
w
′(
X
il
) =
w
′(
X
ik
) =
w
(
X
il
) = 1, leading
to a contradiction with (W2), or
w
′(
X
il
)≠
w
′(
X
ik
)
leading to a contradiction with (W3).
Lemma 2
Any feature weighting method for
which w
(
X
ik
)
is calculated
independently of X
B
\
X
ik
does not fulfill the conditions (W1)-(W4).
Distance Conditions 1
A
distance
measure
d for learning tasks is a mapping
d
:
T
×
T
→
ℝ
+
which should fulfill at least the following
conditions:
(D1) d
(
t
1
,t
2
) = 0
⇔
t
1
=
t
2
(D2) d
(
t
1
,t
2
) =
d
(
t
2
,t
1
)
(D3) d
(
t
1
,t
3
) ≤
d
(
t
1
,t
2
) +
d
(
t
2
,t
3
)
(D4) d
(
t
1
,t
2
) =
d
(
t
1′
,t
2′
)
if
X
B
′
=
X
B
∪
IF
and
IF
⊆
IF
1
∩
IF
2
(D5) d
(
t
1
,t
2
) =
d
(
t
1′
,t
2′
)
if
X
B
′
=
X
B
∪
AF
and
∀
X
k
∈
AF
:
∃
X
l
∈
X
B
:
X
k
~
X
l
Proof:
We assume a learning task
t
i
with no
alternative features and
X
′
B
=
X
B
∪
{
X
ik
} with
∃
X
il
∈
X
B
:
X
il
~
X
ik
. If
w
is independent of
X
B
\
Xik
adding
X
ik
would not change the weight
w
′(
X
il
) in the new
feature space
X
′
B
. From (W3) follows that
w
′(
X
ik
)
=
w
′(
X
il
) =
w
(
X
il
) which is a violation of (W2).
Lemma 2 essentially covers all feature weight-
ing methods that treat features independently.
Important examples for such weighting schemes
are information gain (Quinlan, 1986) or Relief
(Kira & Rendell, 1992).
The next theorem states that the Euclidean
distance cannot be used as a distance measure
based on feature weights.
(D1)-(D3) represent the conditions for a met-
ric. These conditions are required for efficient
Search WWH ::
Custom Search