Information Technology Reference
In-Depth Information
function. In particular, it defines the loss function based on the cosine similarity
between the score vector outputted by the scoring function f for query q , and the
score vector defined with the ground truth label (referred to as the cosine loss for
short). That is,
j = 1 ϕ(y j )ϕ(f (x j ))
2 j = 1 ϕ 2 (y j ) j = 1 ϕ 2 (f (x j ))
1
2
L(f
;
x , y )
=
(2.20)
where ϕ is a transformation function, which can be linear, exponential, or logistic.
After defining the cosine loss, the gradient descent method is used to perform the
optimization and learn the scoring function.
According to [ 17 ], the so-defined cosine loss has the following properties.
The cosine loss can be regarded as a kind of regression loss, since it requires the
prediction on the relevance of a document to be as close to the ground truth label
as possible.
Because of the query-level normalization factor (the denominator in the loss func-
tion), the cosine loss is insensitive to the varying numbers of documents with
respect to different queries.
The cosine loss is bounded between 0 and 1, thus the overall loss on the training
set will not be dominated by specific hard queries.
The cosine loss is scale invariant. That is, if we multiply all the ranking scores
outputted by the scoring function by the same constant, the cosine loss will not
change. This is quite in accordance with our intuition on ranking.
2.6 Summary
In this chapter, we have introduced various pointwise ranking methods, and dis-
cussed their relationship with previous learning-based information retrieval models,
and their limitations.
So far, the pointwise approach can only be a sub-optimal solution to ranking. To
tackle the problem, researchers have made attempts on regarding document pairs
or the entire set of documents associated with the same query as the input object.
This results in the pairwise and listwise approaches to learning to rank. With the
pairwise approach, the relative order among documents can be better modeled. With
the listwise approach, the positional information can be visible to the learning-to-
rank process.
2.7 Exercises
2.1 Enumerate widely used loss functions for classification, and prove whether they
are convex.
Search WWH ::




Custom Search