Database Reference
In-Depth Information
In standard IR, other things being equal, if a query term occurs more
frequently in document d 1 than d 2 , d 1 gets a somewhat larger score than
d 2 . In our setting, it is unclear if multiple occurrences of a selector should
activate the candidate position any more than a single occurrence. In our
experiments, we simply ignored all but the nearest occurrence of each selector,
in effect, setting
because even a
low-IDF selector can boost the score of a non-answer candidate token if it
appears a few times near the candidate. Apart from max and Σ, it might
be worthwhile experimenting with very slow-growing functions of the selector
multiplicity. For
to max. Sum (Σ) behaves poorly as
, sum performs quite well, i.e., we add the activation from
different selectors. Here, too, some extent on non-linearity may be worthwhile
exploring.
10.3.2 Learning the Proximity Scoring Function
For simplicity, we will limit our attention to the W tokens to the left and
right of the candidate position numbered 0 in Figure 10.13 . If the word/term
at offset o is t o , we can rewrite (10.2) as
W
β o = β x
score ( a )=
energy ( t o ) nearest? ( t o ,o,a )
(10.3)
o =
W
= x o
where nearest? ( t, o, a ) is 1 if the nearest occurrence of word t to candidate
a is at offset o , and 0 otherwise. Ties are broken arbitrarily. In the final
dot-product form, x, β
2 W +1 .
In our implementation we made a few further simplifications. First, we
prevented the candidate token from endorsing itself, even if it was also a
selector. Consider the question “Which person designed the Panama Canal?”
with atype person#n#1 . We are certainly not interested in an answer token
person . Therefore, o = 0 is excluded from the sum above. Second, we ignore
the distinction between tokens to the left and right of a , i.e., constrain β −o =
β o ,andaddup x −o and x o suitably. This means, in our implementation,
x, β
R
W .
Suppose x + is the feature vector corresponding to a snippet where position
a is indeed an answer to the query. Let x be a feature vector representing
a snippet that does not contain an answer. Then we want our scoring model
β to satisfy β x + x . Suppose relevance feedback is available in the
form of a set of preference pairs i
R
j , meaning that the candidate position i
should appear lower in the ranked list than position j . Thisisnowsimilarto
Joachim's RankSVM setting (21), and we can use his SVM formulation:
2 β β + C
i
1
v : β x i +1 x j + s ij
min
s≥
s ij
s.t.
i
(10.4)
0
j
As with support vector classifiers, C is a tuned parameter that trades off the
model complexity
β
against violations of the snippet ordering requirements.
Search WWH ::




Custom Search