Information Technology Reference
In-Depth Information
where P denotes the induced distribution, defined by drawing an importance
weighted pair from S which is randomly drawn from an underlying distribution P .
In other words, if we can minimize the regret of the classifier in the training
process, correspondingly, we will also minimize the regret of the ranking model,
and get NDCG effectively maximized.
Note that ω(y u ,y v ,c) , the weight in the loss function ( 3.32 ), is a function of
the classifier c . However, we actually learn this classifier by minimizing the loss
function. Therefore, there is a chicken and egg problem in this process. To solve
the problem, an online learning method is proposed. That is, one first sets an initial
value c 0 for the classifier, and derives the way of updating the classifier (we denote
the updated classifier as c 1 ) by minimizing the loss. After that, the loss function is
updated by substituting c 1 into the weight. Then, the classifier is further updated
to be c 2 by minimizing the updated loss function. This process is iterated until the
process converges.
When applying the above idea to solve real ranking problems, a L 1 regulariza-
tion term is added to the loss function, in order to guarantee the sparse solution of
the parameter of the ranking model. According to the experimental results reported
in [ 31 ], this robust sparse ranker can outperform many other ranking methods, in-
cluding LambdaRank [ 7 ].
3.4 Summary
In this chapter, we have introduced several pairwise ranking methods, and shown
how to improve their performances by balancing the distribution of document pairs
across queries, emphasizing likely top-ranked documents, etc. Empirical studies
have shown that such modifications really lead to performance gains.
In the next chapter, we will introduce the listwise approach, which also tries to
solve the four problems with the pairwise approach, probably in a more natural and
fundamental manner.
3.5 Exercises
3.1 What are the advantages of the pairwise approach as compared to the pointwise
approach?
3.2 Study the relationship among the loss functions used in different pairwise rank-
ing algorithms, and their relationships to the pairwise 0-1 loss.
3.3 Study the mathematical properties of the loss functions used in different pair-
wise ranking algorithms, such as the convexity.
3.4 Study the inter-dependency among document pairs, and discuss the potential
issue of using the pairwise classification methods to perform learning to rank
on such data.
3.5 Provide an intuitive explanation of the margin in Ranking SVM.
Search WWH ::




Custom Search