Machine Learning - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

The Lagrangian of the above optimization problem is

α i y i w T x i +

b −

ξ i .

L =

−

By setting the derivative of the Lagrangian to be zero, the optimization problem

can be written in its dual form (in terms of α i ),

max

2 α i α j y i y j x i

α i −

x j ,

s.t.

α i y i =

0 , i ≥

0 .

This is a typical quadratic programming problem. After solving it, we can recover

the ranking model w = i α i y i x i .

If we use the Karush-Kuhn-Tuker (KTT) conditions to analyze the optimality

of the solutions to the above optimization problem, we will come to the following

conclusion:

y i w T x i + b ≥

α i =

⇒

1 ,

y i w T x i +

b =

0 <α i <C

⇒

1 ,

y i w T x i +

b ≤

α i =

⇒

1 .

That is, the hyperplane w is the linear combination of a small number of samples.

These samples x i with non-zero α i are called support vectors. This is why we have

the name “support vector machines” for the algorithm.

Note that we have explained support vector machines using the linear model as

an example. One can replace the inner product x i x j in the above discussions with

a kernel function K(x i ,x j ) to extend the algorithm to the non-linear case. Typical

kernel functions include the polynomial kernel and the RBF kernel. Kernel functions

can also be regarded as similarity measures between input objects, just as the inner

product. The well-known Mercer's condition states that any positive semi-definite

kernel K(

) , i.e., i,j K(x i ,x j )c i c j

0, can be expressed as an inner product

in a high dimensional space. Therefore, by using the kernel function, we actually

significantly increase the dimensionality of the input space, however, at the same

time, keep the computational complexity almost unchanged.

Support vector machines have been applied in many applications, and have

achieved great success. This is in part because the algorithm is very simple and

clear, and in part because the algorithm has its theoretical guarantee on the gen-

eralization ability. The generalization ability is related to the concept of the VC

dimension, which will be explained more in the next section.

≥

Learning to Rank for Information Retrieval

Search WWH ::

Custom Search

Home