Database Reference
In-Depth Information
In standard IR, other things being equal, if a query term occurs more
frequently in document
d
1
than
d
2
,
d
1
gets a somewhat larger score than
d
2
. In our setting, it is unclear if multiple occurrences of a selector should
activate the candidate position any more than a single occurrence. In our
experiments, we simply ignored all but the nearest occurrence of each selector,
in effect, setting
because even a
low-IDF selector can boost the score of a non-answer candidate token if it
appears a few times near the candidate. Apart from max and Σ, it might
be worthwhile experimenting with very slow-growing functions of the selector
multiplicity. For
to max. Sum (Σ) behaves poorly as
, sum performs quite well, i.e., we add the activation from
different selectors. Here, too, some extent on non-linearity may be worthwhile
exploring.
⊕
10.3.2 Learning the Proximity Scoring Function
For simplicity, we will limit our attention to the
W
tokens to the left and
right of the candidate position numbered 0 in
Figure 10.13
. If the word/term
at offset
o
is
t
o
, we can rewrite (10.2) as
W
β
o
=
β
x
score
(
a
)=
energy
(
t
o
)
nearest?
(
t
o
,o,a
)
(10.3)
o
=
−
W
=
x
o
where
nearest?
(
t, o, a
) is 1 if the nearest occurrence of word
t
to candidate
a
is at offset
o
, and 0 otherwise. Ties are broken arbitrarily. In the final
dot-product form,
x, β
2
W
+1
.
In our implementation we made a few further simplifications. First, we
prevented the candidate token from endorsing itself, even if it was also a
selector. Consider the question “Which person designed the Panama Canal?”
with atype
person#n#1
. We are certainly not interested in an answer token
person
. Therefore,
o
= 0 is excluded from the sum above. Second, we ignore
the distinction between tokens to the left and right of
a
, i.e., constrain
β
−o
=
β
o
,andaddup
x
−o
and
x
o
suitably. This means, in our implementation,
x, β
∈
R
W
.
Suppose
x
+
is the feature vector corresponding to a snippet where position
a
is indeed an answer to the query. Let
x
−
be a feature vector representing
a snippet that does not contain an answer. Then we want our scoring model
β
to satisfy
β
x
+
>β
x
−
. Suppose relevance feedback is available in the
form of a set of preference pairs
i
∈
R
j
, meaning that the candidate position
i
should appear lower in the ranked list than position
j
. Thisisnowsimilarto
Joachim's RankSVM setting (21), and we can use his SVM formulation:
≺
2
β
β
+
C
i
1
v
:
β
x
i
+1
<β
x
j
+
s
ij
min
s≥
s
ij
s.t.
∀
i
≺
(10.4)
0
,β
≺
j
As with support vector classifiers,
C
is a tuned parameter that trades off the
model complexity
β
against violations of the snippet ordering requirements.
Search WWH ::
Custom Search