Information Technology Reference
In-Depth Information
where
sigmoid
t
b
i
θ
u,t,i
x
u,t
+
h
i
(x
u
,x
v
)
=
θ
v,t,i
x
v,t
+
sigmoid
t
b
i
θ
u,t,i
x
v,t
+
=
θ
v,t,i
x
u,t
+
=
h
i
(x
v
,x
u
).
(3.6)
Then, the optimal parameters
θ
,
w
, and
b
are learned by minimizing the follow-
ing loss function:
y
u,v
−
x
v
)
2
y
v,u
−
x
v
)
2
.
L(h
;
x
u
,x
v
,y
u,v
)
=
P(x
u
+
P(x
u
≺
(3.7)
For testing, the learned preference function is used to generate pairwise prefer-
ences for all possible document pairs. Then an additional sorting (aggregation) step,
just as in [
12
], is used to resolve the conflicts in these pairwise preferences and to
generate a final ranked list.
3.2.3 RankNet: Learning to Rank with Gradient Descent
RankNet [
8
] is one of the learning-to-rank algorithms used by commercial search
engines.
1
In RankNet, the loss function is also defined on a pair of documents, but the
hypothesis is defined with the use of a scoring function
f
. Given two documents
x
u
and
x
v
associated with a training query
q
, a target probability
P
u,v
is constructed
P
u,v
=
based on their ground truth labels. For example, we can define
1, if
y
u,v
=
1;
P
u,v
=
0, otherwise. Then, the modeled probability
P
u,v
is defined based on the
difference between the scores of these two documents given by the scoring function,
i.e.,
exp
(f (x
u
)
−
f(x
v
))
P
u,v
(f )
=
f(x
v
))
.
(3.8)
1
+
exp
(f (x
u
)
−
Then the cross entropy between the target probability and the modeled proba-
bility is used as the loss function, which we refer to as the
cross entropy loss
for
short.
−
P
u,v
)
log
1
−
P
u,v
(f )
.
L(f
;
x
u
,x
v
,y
u,v
)
=−
P
u,v
log
P
u,v
(f )
−
(
1
(3.9)
1
As far as we know, Microsoft Bing Search (
http://www.bing.com/
) is using the model trained with
a variation of RankNet.