Information Technology Reference
In-Depth Information
defined. For example, the utility can be any evaluation measure. Suppose the utility
is denoted as U(
·
) , then the following expected utility is used as the objective for
learning:
E p( s | x ,w) U( s , y ) ,
U(w
;
=
x , y )
(4.6)
where w is the parameter of the ranking function, and s is the ranking scores of the
documents.
With the above definition, although the utility itself may be discontinuous and
non-differentiable, it does not contain any model parameter and will not bring trou-
ble to the learning process. What contains the model parameter is the conditional
probability p( s
|
x ,w) .
In
[ 37 ],
the
independence
between
the
documents
(i.e.,
p( s
|
x ,w)
=
j p(s j |
x j ,w) ) is assumed, and a generalized linear model [ 22 ] is used to defined
p(s j |
x j ,w) , the conditional probability for each single document. A generalized
linear model consists of a likelihood model p(y
θ) , a linear combination of the in-
puts and model parameters w T x , and a link function that maps the parameter θ to
the real line. In particular, a binomial function (i.e., Bin (
|
) ) is used as the likelihood
model in [ 37 ], and a cumulative normal function (i.e., Ψ( · ) ) is used to define the
·
Bin s j ;
Ψ w T x j ;
,m ,
0 , 1
π
p(s j |
=
x ,w)
(4.7)
where m is the number of documents.
Then the algorithm maximizes the expected utility (equivalently, the negative
utility can be regarded as the loss function). In particular, the approximate Bayesian
inference procedure [ 18 ] with a factorized Gaussian prior is used to learn the model
parameters.
Note that in addition to optimizing evaluation measures, the decision theoretical
framework can also be used to optimize other utilities, e.g., the probability of user
clicks on the documents.
4.2.1.3 Approximate Rank
In [ 25 ], Qin et al. point out that the underlying reason why evaluation measures
are non-smooth is that rank positions are non-smooth with respect to ranking
scores. Therefore, they propose performing approximation to the rank positions us-
ing smooth functions of the ranking scores, such that the approximate evaluation
measures can consequently become differentiable and easier to optimize.
Here we take NDCG as an example to illustrate the approximation process. The
same idea also applies to MAP. For more details, please refer to [ 25 ].
If one changes the index of summation in the definition of NDCG (see Chap. 1)
from the positions in the ranking result to the indexes of documents, NDCG can be
Search WWH ::

Custom Search