Data Preprocessing for Learning to Rank - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

In order to efficiently solve this optimization problem, the authors of [ 16 ] relax

the integer constraints on α u and show that the relaxed problem becomes an eigen-

value decomposition problem, which can be efficiently obtained by many state-of-

the-art eigensolves.

Experimental results on a commercial dataset show that the aforementioned ap-

proach to data selection works quite well, and the selected subset can lead to a ranker

with significantly better ranking performance than the original training data.

13.3.3 Feature Selection for Training

Similar to document selection, the selection of features may also influence the ef-

fectiveness and efficiency of the learning-to-rank algorithms.

•

A large number of features will influence the efficiency of the learning-to-rank

algorithms. When this becomes an issue, the most straightforward solution is to

remove some less effective features.

•

Some features are not very useful and sometimes are even harmful to learning-

to-rank algorithms. In this case, if using the entire feature set, the effectiveness of

the learned model may be affected.

While there have been extensive studies on feature selection in classification, the

study on feature selection for ranking is still limited. In [ 17 ], Geng et al. argue that

it is not a good choice to directly apply the feature selection techniques for classifi-

cation to ranking and propose a new feature selection method specifically designed

for ranking. Basically two kinds of information are considered in the method: the

importance of individual features and similarity between features.

The importance of each feature is determined using an evaluation measure (e.g.,

MAP and NDCG) or a loss function (e.g., loss functions in Ranking SVM [ 20 , 21 ],

MCRank [ 23 ], or ListNet [ 5 ]). In order to get such importance, one first ranks the

documents using the feature, and then evaluates the performance in terms of the

evaluation measure or the loss function. Note that for some features larger values

correspond to higher ranks while for other features smaller values correspond to

higher ranks. When calculating the importance, it is necessary to sort the documents

twice (in the normal order and in the inverse order).

The similarity between features is used to remove redundancy in the selected fea-

tures. In [ 17 ], the similarity between two features is computed on the basis of their

ranking results. That is, each feature is regarded as a ranking model, and the similar-

ity between two features is represented by the similarity between the ranking results

that they produce. Many methods can be used to measure the distance between two

ranking results. Specifically, Kendall's τ [ 22 ] is chosen in [ 17 ].

Considering the above two aspects, the overall feature selection criterion is for-

malized as the following optimization problem. That is, one selects those fea-

tures with the largest total importance scores and the smallest total similarity

Search WWH ::

Custom Search

Home