Information Technology Reference
In-Depth Information
2.3 Classification-Based Algorithms
Analogously to reducing ranking to regression, one can also consider reducing rank-
ing to a classification problem. Classification is a kind of supervised learning prob-
lem in which the target variable that one is trying to predict is discrete. When for-
malizing ranking as classification, one regards the relevance degree given to a doc-
ument as a category label. Here we introduce some representative algorithms in this
sub-category.
2.3.1 Binary Classification for Ranking
There have been several works that study the use of a classification model for rel-
evance ranking in information retrieval, such as [
4
,
9
] and [
16
]. Here we take [
16
]
and [
9
] as examples to illustrate the basic idea.
m
j
SVM-Based Method
Given documents
x
={
x
j
}
1
, and their binary relevance
=
m
j
=
judgments
y
={
y
j
}
associated with a query
q
, one regards all the relevant doc-
1
uments (i.e.,
y
j
=+
1) as positive examples while all the irrelevant documents (i.e.,
y
j
=−
1) as negative examples, and therefore formulates ranking as a binary classi-
fication problem.
In particular, Support Vector Machines (SVM) [
21
,
22
] are use to perform the
classification task in [
16
]. The formulation of SVM is as follows when applied to
the scenario of ranking (here we suppose linear scoring functions are used, i.e.,
f(w,x)
w
T
x
),
=
m
(i)
n
min
1
ξ
(i)
j
2
2
w
+
λ
i
=
1
j
=
1
w
T
x
(i)
j
+
ξ
(i)
j
if
y
(i)
j
s.t.
≤−
1
,
=
0
.
(2.2)
w
T
x
(i)
j
−
ξ
(i)
j
if
y
(i)
j
≥
1
,
=
1
.
ξ
(i)
j
1
,...,m
(i)
,i
=
≥
0
,j
=
1
,...,n,
where
x
(i)
j
,y
(i
j
are the
j
th document and its label with regards to query
q
i
and
m
(i)
is the number of documents associated with query
q
i
.
The constraints in the above optimization problem correspond to whether each
training document
x
(i)
j
is classified into the right binary class. Actually the loss
function behind the above optimization problem is a hinge loss defined on each
document. For example, if the label
y
(i)
j
1, and the model output
w
T
x
(i)
j
is
+
is no
less than
+
1, then there is a zero loss for this document. Otherwise, the loss will
be
ξ
(i)
j
, which is usually called the soft margin term. We plot the curve of the hinge
loss in Fig.
2.2
for ease of understanding.