Information Technology Reference
In-Depth Information
In order to efficiently solve this optimization problem, the authors of [ 16 ] relax
the integer constraints on α u and show that the relaxed problem becomes an eigen-
value decomposition problem, which can be efficiently obtained by many state-of-
the-art eigensolves.
Experimental results on a commercial dataset show that the aforementioned ap-
proach to data selection works quite well, and the selected subset can lead to a ranker
with significantly better ranking performance than the original training data.
13.3.3 Feature Selection for Training
Similar to document selection, the selection of features may also influence the ef-
fectiveness and efficiency of the learning-to-rank algorithms.
A large number of features will influence the efficiency of the learning-to-rank
algorithms. When this becomes an issue, the most straightforward solution is to
remove some less effective features.
Some features are not very useful and sometimes are even harmful to learning-
to-rank algorithms. In this case, if using the entire feature set, the effectiveness of
the learned model may be affected.
While there have been extensive studies on feature selection in classification, the
study on feature selection for ranking is still limited. In [ 17 ], Geng et al. argue that
it is not a good choice to directly apply the feature selection techniques for classifi-
cation to ranking and propose a new feature selection method specifically designed
for ranking. Basically two kinds of information are considered in the method: the
importance of individual features and similarity between features.
The importance of each feature is determined using an evaluation measure (e.g.,
MAP and NDCG) or a loss function (e.g., loss functions in Ranking SVM [ 20 , 21 ],
MCRank [ 23 ], or ListNet [ 5 ]). In order to get such importance, one first ranks the
documents using the feature, and then evaluates the performance in terms of the
evaluation measure or the loss function. Note that for some features larger values
correspond to higher ranks while for other features smaller values correspond to
higher ranks. When calculating the importance, it is necessary to sort the documents
twice (in the normal order and in the inverse order).
The similarity between features is used to remove redundancy in the selected fea-
tures. In [ 17 ], the similarity between two features is computed on the basis of their
ranking results. That is, each feature is regarded as a ranking model, and the similar-
ity between two features is represented by the similarity between the ranking results
that they produce. Many methods can be used to measure the distance between two
ranking results. Specifically, Kendall's τ [ 22 ] is chosen in [ 17 ].
Considering the above two aspects, the overall feature selection criterion is for-
malized as the following optimization problem. That is, one selects those fea-
tures with the largest total importance scores and the smallest total similarity
Search WWH ::




Custom Search