Relational Ranking - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

In [ 1 ], it is investigated how to diversify search results by making explicit use of

knowledge about the topics that the query or the documents may refer to. Specif-

ically it is assumed that there exists a taxonomy of information, and the diversity

in user intents is modeled at the topical level of the taxonomy. On this basis, an

objective is proposed to explicitly trade off relevance and diversity as follows:

q) 1

q,c) ,

1

|

−

|

P(S

|

q)

=

P(c

V(x

(6.13)

c

x

∈

S

where V(x

q,c) denotes the relevance probability of a document x for query q

when the intended category is c .

It is not difficult to get the probabilistic interpretation of the above objective.

Actually, it exactly describes the probability that at least one of the documents in

the document set S satisfies the average user (in terms of expectation) who issues

query q .

It has been proven that the general problem of optimizing the above objective

is NP-hard. The good news is that the objective function admits a submodularity

structure [12] that can be exploited for the implementation of a good approxima-

tion algorithm. Intuitively, a submodular function satisfies the economic principle

of diminishing marginal returns, i.e., the marginal benefit of adding a document to

a larger collection is less than that of adding it to a smaller collection. With this

property, a greedy algorithm can be designed.

Given the top- k documents selected by some classical ranking algorithm for the

target query, the greedy algorithm reorders these documents to maximize the objec-

tive P(S

|

q,S) denote the conditional probability that query q belongs

to category c , given that all documents in set S fail to satisfy the user. Initially, be-

fore any document is selected, U(c

|

q) .Let U(c

|

q) . The algorithm selects output

documents one at a time. At every step, it chooses the document that has the highest

marginal utility defined as below,

|

q,

∅

)

=

P(c

|

g(x

|

q,c,S)

=

U(c

|

q,S)V(x

|

q,c).

(6.14)

This marginal utility can be interpreted as the probability that the selected doc-

ument satisfies the user given that all documents coming before it fail to do so. At

the end of the loop, the conditional distribution is updated to reflect the inclusion of

the new document to the result set using the Bayes rule.

As for the algorithm, the condition has been derived on which the greedy heuris-

tic can lead to the optimal diversification result. Some further analysis shows that

even if the condition is not met, the approximation error of the greedy method can

be well bounded.

Inspired by the aforementioned new algorithm, the authors of [ 1 ] further propose

refining existing evaluation measures in information retrieval to be intent-aware

measures. For example, now NDCG and MAP will become

NDCG IA @ k =

P(c | q) NDCG@ k,

(6.15)

c

Learning to Rank for Information Retrieval

Search WWH ::

Custom Search

Home