Information Technology Reference
In-Depth Information
(i.e., if α u =
1, then the document is selected, and otherwise not selected). Only
when both documents associated with a query are selected, can the corresponding
document pair be selected. Therefore, the PPC of the selected subset can be calcu-
lated as follows (here the inner product is used as the similarity function),
x u
x v T x q
v α u α v α q
1
x q
u α q
u
v .
(13.13)
m q ˜
˜
m q
u,v
q,q
u ,v
The task is to find optimal indicator variables α u , such that the PPC of the se-
lected subset can be maximized. To tackle this task, an efficient stagewise solution
is proposed. That is, one first finds the optimal selection of document pairs, and then
take it as a given condition to find the optimal selection of documents.
Specifically, the sub-task of document pair selection is formulated as follows.
Given a training data collection S , we use variable ω u,v to indicate whether the
document pair (x u ,x v ) is selected or not. Then the PPC of the selected subset of
document pairs can be written as follows:
u ,v x u
x v T x q
v =
1
ω u,v ω q
x q
ω T R T Rω,
PPC(S, ω)
=
u
m q ˜
˜
m q
u,v
q,q
u ,v
(13.14)
= q
where ω is a vector of dimension p ( p
m q is the total number of document
pairs); its element is ω u,v ; R is a f × p matrix ( f is the number of features) with
each column representing
˜
x u
x v
m q .
Considering that reducing the size of the training data may potentially hurt the
generalization ability of learning algorithms, the authors of [ 16 ] propose maximiz-
ing the number of selected document pairs at the same time as maximizing the above
PPC. As a result, the following optimization problem is obtained:
˜
μ 2 e T ω 2 ,
ω T R T
max
ω
+
(13.15)
where the parameter μ is a tradeoff coefficient.
In order to efficiently solve the above optimization problem, the following strate-
gies are used in [ 16 ]:
Conducting eigendecomposition on R T R .
Projecting variable ω to the linear space spanned by the eigenvectors, so as to
transform the original problem to its equivalent form.
Solving the equivalent problem by maximizing a lower bound of its objective.
After obtaining the optimal selection of document pairs, the sub-task of docu-
ment selection is formulated as below.
min
u
α u α v
ω u,v 2 ,
(13.16)
v
where m q denotes the total number of documents associated with query q .
Search WWH ::




Custom Search