Collaborative Use of Features in a Distributed System for the Organization of Music Collections - Intelligent Music Information Systems: Tools and Methodologies

Information Technology Reference

In-Depth Information

Theorem 3 Euclidean distance does not fulfill

the conditions (D1)-(D5).

Support Vector Machines are based on the work

of Vapnik in statistical learning theory (Vapnik,

1995). They aim to minimize the regularized

risk R reg ( f ) of a learned function f which is the

weighted sum of the empirical risk R emp ( f ) and a

complexity term || w || 2 :

Proof: We give a counterexample. We assume

that a weighting function w is given which fulfills

the conditions (W1)-(W4). Further assume that

learning tasks t i,tj are given with no alternative

features. We add an alternative feature X ik to X B

and get X ′ B = X B ∪ { X ik } with ∃ X il ∈ X B : X il ~ X ik .

We infer from conditions (W2) and (W3) that

R reg [ f ] = R emp [ f ] +λ || w || 2

The result is a linear decision function y =

sgn( w ⋅ x + b ) with a minimal length of w . The

vector w is the normal vector of an optimal hy-

perplane with a maximal margin to both classes.

One of the strengths of SVMs is the use of kernel

functions to extend the feature space and allow

linear decision boundaries after efficient nonlin-

ear transformations of the input (Schölkopf and

Smola, 2002). Since our goal is the construction

of (nonlinear) features during preprocessing we

can just use the most simple kernel function which

is the dot product. In this case the components

of the vector w can be interpreted as weights for

all features.

(

( X il )

)

(

( X jl )

)

w' ( X ik ) = w' ( X il ) =

and w' ( X ik ) = w' ( X il ) =

and from condition (W4) that

∀ p ≠ k : w' ( X ip ) = w ( X ip ) and ∀ p ≠ k : w' ( X jp ) = w ( X jp )

In this case the following holds for the Eu-

clidean distance

(

)



(

)



′

(

)

−

′

(

))²





−





d ( t ′ i ,t ′ j )

(

)

−

(

)

≠

(

)

−

(

)

= d ( t i ,t j )

Theorem 4 The feature weight calculation of

SVMs with linear kernel function meets the con-

ditions (W1)-(W4).

With

∑

′

(

)

−

(

)

(

)

−

(

)

≠

Proof: Since these conditions can be proved

for a single learning task t i we write X k and w k as

a shortcut for X ik and w ( X ik ).

(W1) Sketch We assume that the SVM finds

an optimal hyperplane. The algorithm tries to

minimize both the length of w and the empirical

error. This naturally corresponds to a maximum

margin hyperplane where the weights of irrelevant

features are 0 if enough data points are given.

(W2) SVMs find the optimal hyperplane by

minimizing the weight vector w . Using the optimal

classification hyperplane with weight vector w

can be written as y = sgn ( w 1 x 1 + …+ w i x i +…+

w m x m + b ). We will show that this vector cannot

be changed by adding the same feature more than

Hence, alternative features here do make a

difference, violating (D5).

positive results

We have shown that many common feature weight-

ing algorithms and distance measures cannot be

used for learning task distance in our scenario. In

this section, we will prove that the feature weights

delivered by a linear Support Vector Machine

(SVM) obeys the proposed weighting conditions.

Afterwards, we also discuss a distance measure

fulfilling the distance conditions.

Intelligent Music Information Systems: Tools and Methodologies

Search WWH ::

Custom Search

Home