Information Technology Reference

In-Depth Information

Chapter 3

The Pairwise Approach

Abstract
In this chapter we will introduce the pairwise approach to learning to

rank. Specifically we first introduce several example algorithms, whose major differ-

ences are in the loss functions. Then we discuss the limitations of these algorithms

and present some improvements that enable better ranking performance.

3.1 Overview

The pairwise approach does not focus on accurately predicting the relevance degree

of each document; instead, it cares about the relative order between two documents.

In this sense, it is closer to the concept of “ranking” than the pointwise approach.

In the pairwise approach, ranking is usually reduced to a classification on docu-

ment pairs, i.e., to determine which document in a pair is preferred. That is, the goal

of learning is to minimize the number of miss-classified document pairs. In the ex-

treme case, if all the document pairs are correctly classified, all the documents will

be correctly ranked. Note that this classification differs from the classification in the

pointwise approach, since it operates on every two documents under investigation.

A natural concern is that document pairs are not independent, which violates the

basic assumption of classification. The fact is that although in some cases this as-

sumption really does not hold, one can still use classification technology to learn the

ranking model. However, a different theoretical framework is needed to analyze the

generalization of the learning process. We will make discussions on this in Part VI.

There are many pairwise ranking algorithms proposed in the literature of learning

to rank, based on neural network [
8
], perceptron [
21
,
30
], Boosting [
18
], support

vector machines [
22
,
23
], and other learning machines [
12
,
33
,
36
,
37
]. In the rest

of this chapter, we will introduce several examples of them.

3.2 Example Algorithms

3.2.1 Ordering with Preference Function

In [
12
], a hypothesis
h(x
u
,x
v
)
directly defined on a pair of documents is studied

(i.e., without use of the scoring function
f
). In particular, given two documents
x
u