The Pairwise Approach - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

It is clear that the larger the value is, the more likely x v is irrelevant but still

ranked on the top of the final ranked list. As mentioned before, we should punish this

situation since errors occurring at the top positions will damage the user experience.

The proposal in [ 29 ] is to push such x v down from the top. Specifically, a convex,

non-negative, monotonically increasing function g is introduced for this purpose. If

g is very steep, an extremely high price for a high-ranked irrelevant documents will

be paid. Examples of steep functions include the exponential function ( g(r)

e r )

=

r p , where p is large). The power function is used

in [ 29 ] as an example. Accordingly, the overall loss with regards to x v becomes

and the power function ( g(r)

=

x u ,x v ,y u,v ) p

L(f

;

x v )

=

L(f

;

.

(3.27)

u,y u,v =

1

After formulating the above loss function, a Boosting-style algorithm is devel-

oped to minimize the loss. Experimental results show that by introducing the extra

penalty for the top-ranked irrelevant documents the ranking performance can be

significantly improved.

3.3.6 Ordered Weighted Average for Ranking

In [ 34 ], Usunier et al. also aim at solving the forth problem with the pairwise ap-

proach. In particular, a weighted pairwise classification method is proposed, which

emphasizes the correct ranking at the top positions of the final ranked list. Specif-

ically, convex Ordered Weighted Averaging (OWA) operators [ 35 ] are used, which

are parameterized by a set of decreasing weights α t (the weight α t is associated to

the t th highest loss), to make a weighted sum of the pairwise classification losses.

By choosing appropriate weights the convex OWA operators can lead to a loss func-

tion that focuses on the top-ranked documents.

Specifically, for

∀

s

=

(s 1 ,...,s m ) , its corresponding OWA is defined as follows:

m

owa α (s) =

α t s σ(t) ,

(3.28)

t

=

1

where σ is a permutation such that

s σ(t + 1 ) .

And accordingly, the loss functions in the pairwise ranking algorithms can be

refined as follows.

∀

t,s σ(t) ≥

owa v,y u,v = 1 L(f

y u,v ,x u ,x v ) .

L(f

;

x u )

=

;

(3.29)

In general, the above loss associates the largest weights to the largest pairwise

losses, and thus to the document pairs containing the irrelevant documents with the

greatest relevance scores.

Experimental results demonstrate that the use of the OWA based loss can lead to

better performance than the use of the original pairwise loss functions.

Learning to Rank for Information Retrieval

Search WWH ::

Custom Search

Home