Database Reference
In-Depth Information
1
Rough
Smooth
0.8
0.6
0.4
0.2
0
Gap j
0
10
20
30
40
50
FIGURE 10.15 : β j shows a noisy unimodal pattern.
10.3.3.4 Accuracy using the fitted decay
Finally, we plug in the smooth β in place of decay and make an end-to-end
evaluation of the snippet ranking system. In a standard IR system (39), the
score of a snippet would be decided by a vector space model using selectors
alone. We gave the standard score the additional benefit of considering only
those snippets centered at an atype candidate, and considering each matched
selector only once (i.e., use only IDF and not TF). Even so, a basic IR scoring
approach was significantly worse than the result of plugging in β ,asshownin
Figure 10.16. “R300” is the fraction of truly relevant snippets recovered within
the first 300 positions. The “reciprocal rank” for a fixed question is one divided
by the first rank at which an answer snippet was found. Mean reciprocal rank
or MRR is the above averaged over queries. Both recall and MRR over held-
outtestdataimprovesubstantially compared to the IR baseline.
β from
Train
Test
R300
MRR
IR-IDF
-
2000
211
0.16
RankExp
1999
2000
231
0.27
RankExp
2000
2000
235
0.31
RankExp
2001
2000
235
0.29
FIGURE 10.16 : End-to-end accuracy using RankExp β is significantly
better than IR-style ranking. Train and test years are from 1999, 2000, 2001.
R300 is recall at k = 300 out of 261 test questions. C =0 . 1, C =1and
C = 10 gave almost identical results.
Observe that we used three years of TREC data (1999, 2000, 2001) for
training and one year (2000) for testing. The accuracy listed for training year
2000 is meant only for sanity-checking because the training set is the same as
 
 
Search WWH ::




Custom Search