Information Technology Reference
In-Depth Information
one time. We assume that all alternative features
can be transformed into identical features by
normalizing the data. Adding k - 1 alternative
features will result in:
The sum on the left side contains another w k .
This leads to a system of linear equations of the
form + 0 w i + + 2 w k + = ŵ . Solving
this system of equations leads to w p = w q (condi-
tion (W3)).
(W4) Sketch: We again assume that a SVM
finds an optimal hyperplane given enough data
points. Since condition (W1) holds adding an ir-
relevant feature would not change the hyperplane
and thus the weighting vector w for the base fea-
tures will remain. The proofs of conditions (W2)
and (W3) state that the optimal hyperplane is not
affected by alternative features as well.
In order to calculate the distance of learning
tasks based only on a set of base feature weights
we still need a distance measure that meets the
conditions (D1)-(D5).
1
k
i
y
=
sgn
...
+
(
w
+
...
+
w
)
x
+
...
+
b


i
i
alternativ
e
features
However, the optimal hyperplane will remain
the same and does not depend on the number of
alternative attributes. This means that the other
values w j will not be changed. This leads to
k
=
l
i
w
=
w
i
l
1
|
which proves condition (W2).
(W3) The SVM optimization minimizes the
length of the weight vector w . This can be writ-
ten as
Theorem 5 Manhattan distance does fulfill the
conditions (D1)-(D5).
Proof: The conditions (D1)-(D3) are fulfilled
due to basic properties of the Manhattan distance.
Therefore, we only give proofs for conditions
(D4) and (D5).
(D4) We follow from the definition of the
Manhattan distance that
!
2
1
2
2
w
+
...
+
w
+
...
+
w
=
min
.
i
m
We replace w i using condition (W2):
2
!
2
1
ˆ
2
w
+
...
+
w
w
+
...
+
w
=
min
.

d t
( ' , ' )
t
=
w' (X
) - W' (X
)
+
w' (X
) - w' (X
)
j
m
i
j
i
ip
j
jp
i
iq
j
jq
X
,X
X
X
,X
I
F
ip
jp
B
iq
jq
j
i
0
= d ( t i ,t j )
In order to find the minimum we have to
partially differentiate the last equation for all
weights w k :
from (W4).
(D5) Sketch We show the case for adding k
features with ∀ X ik : X ik ~ X il for a fixed X il X B :
2
ˆ
2
...
+
w
w
+
w
+
...
=
0
j
k
w
X
j
i
k
B
w' (X
)-W' (X
) +(k+1)
w' (X
) - w' (X
)
d ( t i ,t i ) =
i
ip
j
jp
i
ik
j
jk
p=1 p
k
X
B
=
w (X
)-w (X
) w (X
)-w (X
)
i
ip
j
jp
i
ik
j
jk
2w k
-
2
w
ˆ
w
=
0
w
+
w
=
w
ˆ
p=1 p
k
j
k
j
j
i
j
i
= d ( t i ,t j )
from (W4) and (W2).
 
Search WWH ::




Custom Search