Information Technology Reference
In-Depth Information
20.3 Feature Engineering
After one extracts a set of features for each document, it seems that the learning-
to-rank problem becomes a standard prediction task. However, one should notice
that ranking is deeply rooted in information retrieval, so the eventual goal of learn-
ing to rank is not only to develop a set of new algorithms and theories, but also to
substantially improve the ranking performance in real information retrieval applica-
tions. For this purpose, feature engineering cannot be overlooked. It is a killer aspect
whether we can encode the knowledge on information retrieval accumulated in the
past half a century in the extracted features.
Currently, there is not much work on feature extraction for learning to rank. There
may be a couple of reasons. First, the research community of learning to rank has
not paid enough attention to this topic, since feature extraction is somehow regarded
as engineering while designing a learning algorithm seems to have more research
value. However, here we would like to point out that the feature extraction itself
also has a lot of research potential. For example, one can study how to automati-
cally extract effective features from raw contents of the query and the web docu-
ments. Second, features are usually regarded as the key business secrete for a search
engine—it is easy to change for another learning-to-rank algorithm (since different
algorithms usually take the same format of inputs), however, it is much more diffi-
cult to change the feature extraction pipeline. This is because feature extraction is
usually highly coupled with the indexing and storage systems of a search engine,
and encodes much human intelligence. As a result, for those benchmark datasets
released by commercial search engines, usually we have no access to the features
that they really used in their live systems. To work together with these commercial
search engines to share more insights on features will be very helpful for the future
development of the learning-to-rank community.
20.4 Advanced Ranking Models
In most existing learning-to-rank algorithms, a scoring function is used as the rank-
ing model for the sake of simplicity and efficiency. However, sometimes such a sim-
plification cannot handle complex ranking problems. Researchers have made some
attempts on leveraging the inter-relationships between objects [ 12 - 14 ]; however,
this is not yet the most straightforward way of defining the hypothesis for ranking,
especially for the listwise approach.
Since the output space of the listwise approach is composed of permutations
of documents, the ranking hypothesis should better directly output permutations of
documents, rather than output scores for each individual document. In this regard,
defining the ranking hypothesis as a multi-variate function that directly outputs per-
mutations could be a future research topic. Note that the task is challenging because
permutation-based ranking functions can be very complex due to the extremely large
number of possible permutations. But it is worthy and also possible to find efficient
algorithms to deal with this situation.
Search WWH ::




Custom Search