The Pairwise Approach - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

where

sigmoid

b i

θ u,t,i x u,t +

h i (x u ,x v )

θ v,t,i x v,t +

sigmoid

b i

θ u,t,i x v,t +

θ v,t,i x u,t +

h i (x v ,x u ).

(3.6)

Then, the optimal parameters θ , w , and b are learned by minimizing the follow-

ing loss function:

y u,v −

x v ) 2

y v,u −

x v ) 2 .

L(h ; x u ,x v ,y u,v )

P(x u

P(x u ≺

(3.7)

For testing, the learned preference function is used to generate pairwise prefer-

ences for all possible document pairs. Then an additional sorting (aggregation) step,

just as in [ 12 ], is used to resolve the conflicts in these pairwise preferences and to

generate a final ranked list.

3.2.3 RankNet: Learning to Rank with Gradient Descent

RankNet [ 8 ] is one of the learning-to-rank algorithms used by commercial search

engines. 1

In RankNet, the loss function is also defined on a pair of documents, but the

hypothesis is defined with the use of a scoring function f . Given two documents x u

and x v associated with a training query q , a target probability

P u,v is constructed

P u,v =

based on their ground truth labels. For example, we can define

1, if y u,v =

P u,v =

0, otherwise. Then, the modeled probability P u,v is defined based on the

difference between the scores of these two documents given by the scoring function,

i.e.,

exp (f (x u ) − f(x v ))

P u,v (f ) =

f(x v )) .

(3.8)

exp (f (x u )

−

Then the cross entropy between the target probability and the modeled proba-

bility is used as the loss function, which we refer to as the cross entropy loss for

short.

− P u,v ) log 1

− P u,v (f ) .

L(f ; x u ,x v ,y u,v ) =− P u,v log P u,v (f ) − ( 1

(3.9)

1 As far as we know, Microsoft Bing Search ( http://www.bing.com/ ) is using the model trained with

a variation of RankNet.

Learning to Rank for Information Retrieval

Search WWH ::

Custom Search

Home