Information Technology Reference
In-Depth Information
Fig. 3.3
A two-layer neural
network
It is not difficult to verify that the cross entropy loss is an upper bound of the
pairwise 0-1 loss, which is defined by
1 , u,v (f (x u )
f(x v )) < 0 ,
L 0 1 (f
;
x u ,x v ,y u,v )
=
(3.10)
0 ,
otherwise .
A neural network is then used as the model and gradient descent as the optimiza-
tion algorithm to learn the scoring function f . A typical two-layer neural network is
showninFig. 3.3 , where the features of a document are inputted at the bottom layer;
the second layer consists of several hidden nodes; each node involves a sigmoid
transformation; and the output of the network is the ranking score of the document.
In [ 25 ], a nested ranker is built on top of RankNet to further improve the retrieval
performance. Specifically, the new method iteratively re-ranks the top scoring doc-
uments. At each iteration, this approach uses the RankNet algorithm to re-rank a
subset of the results. This splits the problem into smaller and easier tasks and gen-
erates a new distribution of the results to be learned by the algorithm. Experimental
results show that making the learning algorithm iteratively concentrate on the top
scoring results can improve the accuracy of the top ten documents.
3.2.4 FRank: Ranking with a Fidelity Loss
Some problems with the loss function used in RankNet have been pointed out
in [ 33 ]. Specifically, the curve of the cross entropy loss as a function of f(x u )
f(x v ) is plotted in Fig. 3.4 . From this figure, one can see that in some cases the
cross entropy loss has a non-zero minimum, indicating that there will always be
some loss no matter what kind of model is used. This may not be in accordance
with our intuition of a loss function. Furthermore, the loss is not bounded, which
may lead to the dominance of some difficult document pairs in the training process.
Search WWH ::




Custom Search