Future Work - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

20.3 Feature Engineering

After one extracts a set of features for each document, it seems that the learning-

to-rank problem becomes a standard prediction task. However, one should notice

that ranking is deeply rooted in information retrieval, so the eventual goal of learn-

ing to rank is not only to develop a set of new algorithms and theories, but also to

substantially improve the ranking performance in real information retrieval applica-

tions. For this purpose, feature engineering cannot be overlooked. It is a killer aspect

whether we can encode the knowledge on information retrieval accumulated in the

past half a century in the extracted features.

Currently, there is not much work on feature extraction for learning to rank. There

may be a couple of reasons. First, the research community of learning to rank has

not paid enough attention to this topic, since feature extraction is somehow regarded

as engineering while designing a learning algorithm seems to have more research

value. However, here we would like to point out that the feature extraction itself

also has a lot of research potential. For example, one can study how to automati-

cally extract effective features from raw contents of the query and the web docu-

ments. Second, features are usually regarded as the key business secrete for a search

engine—it is easy to change for another learning-to-rank algorithm (since different

algorithms usually take the same format of inputs), however, it is much more diffi-

cult to change the feature extraction pipeline. This is because feature extraction is

usually highly coupled with the indexing and storage systems of a search engine,

and encodes much human intelligence. As a result, for those benchmark datasets

released by commercial search engines, usually we have no access to the features

that they really used in their live systems. To work together with these commercial

search engines to share more insights on features will be very helpful for the future

development of the learning-to-rank community.

20.4 Advanced Ranking Models

In most existing learning-to-rank algorithms, a scoring function is used as the rank-

ing model for the sake of simplicity and efficiency. However, sometimes such a sim-

plification cannot handle complex ranking problems. Researchers have made some

attempts on leveraging the inter-relationships between objects [ 12 - 14 ]; however,

this is not yet the most straightforward way of defining the hypothesis for ranking,

especially for the listwise approach.

Since the output space of the listwise approach is composed of permutations

of documents, the ranking hypothesis should better directly output permutations of

documents, rather than output scores for each individual document. In this regard,

defining the ranking hypothesis as a multi-variate function that directly outputs per-

mutations could be a future research topic. Note that the task is challenging because

permutation-based ranking functions can be very complex due to the extremely large

number of possible permutations. But it is worthy and also possible to find efficient

algorithms to deal with this situation.

Search WWH ::

Custom Search

Home