Information Technology Reference
In-Depth Information
2.3 Classification-Based Algorithms
Analogously to reducing ranking to regression, one can also consider reducing rank-
ing to a classification problem. Classification is a kind of supervised learning prob-
lem in which the target variable that one is trying to predict is discrete. When for-
malizing ranking as classification, one regards the relevance degree given to a doc-
ument as a category label. Here we introduce some representative algorithms in this
sub-category.
2.3.1 Binary Classification for Ranking
There have been several works that study the use of a classification model for rel-
evance ranking in information retrieval, such as [ 4 , 9 ] and [ 16 ]. Here we take [ 16 ]
and [ 9 ] as examples to illustrate the basic idea.
m
j
SVM-Based Method
Given documents x
={
x j }
1 , and their binary relevance
=
m
j =
judgments y
={
y j }
associated with a query q , one regards all the relevant doc-
1
uments (i.e., y j =+
1) as positive examples while all the irrelevant documents (i.e.,
y j =−
1) as negative examples, and therefore formulates ranking as a binary classi-
fication problem.
In particular, Support Vector Machines (SVM) [ 21 , 22 ] are use to perform the
classification task in [ 16 ]. The formulation of SVM is as follows when applied to
the scenario of ranking (here we suppose linear scoring functions are used, i.e.,
f(w,x)
w T x ),
=
m (i)
n
min 1
ξ (i)
j
2
2
w
+
λ
i
=
1
j
=
1
w T x (i)
j
+ ξ (i)
j
if y (i)
j
s.t.
≤−
1
,
=
0 .
(2.2)
w T x (i)
j
ξ (i)
j
if y (i)
j
1
,
=
1 .
ξ (i)
j
1 ,...,m (i) ,i =
0 ,j =
1 ,...,n,
where x (i)
j ,y (i j are the j th document and its label with regards to query q i and m (i)
is the number of documents associated with query q i .
The constraints in the above optimization problem correspond to whether each
training document x (i)
j is classified into the right binary class. Actually the loss
function behind the above optimization problem is a hinge loss defined on each
document. For example, if the label y (i)
j
1, and the model output w T x (i)
j
is
+
is no
less than
+
1, then there is a zero loss for this document. Otherwise, the loss will
be ξ (i)
j , which is usually called the soft margin term. We plot the curve of the hinge
loss in Fig. 2.2 for ease of understanding.
Search WWH ::

Custom Search