Database Reference
In-Depth Information
1.00E-15
1.00E-06
1.00E-03
1.00E-01
1.E+14
1.E+12
1.E+10
1.E+08
1.E+06
1.E+04
1.E+02
1.E+00
0.0E+0
5.0E+8
1.0E+9
1.5E+9
2.0E+9
Estimated Index Size
1.E+06
1.E+05
1.E+04
1.E+03
1.E+02
1.E+01
1.E+00
0.0E+0
5.0E+8
1.0E+9
1.5E+9
2.0E+9
Estimated Index Size
FIGURE 10.28 (SEE
COLOR INSERT
FOLLOWING PAGE 130.)
:
Estimated space-time tradeoffs produced by
AtypeSubsetChooser
.The
y-axis uses a log scale.
Note that the curve for
=10
−
3
(suggested by
Figure 10.19
) has the lowest average bloat.
10.4.6.3
Query execution dynamics
increasing index sizes, broken down into Lucene scan+merge time taken if
R
=
A
(“FineTime”), Lucene scan+merge time using a generalized
g
if
R
⊂
A
(“PreTime”) and the post-filtering time (“PostTime”). As can be seen,
there are regimes where scan time dominates and others where filtering time
dominates. This highlights why the choice of a good
R
is a tricky operation:
we cannot assume cost estimates that are any simpler.
10.5 Conclusion
10.5.1 Summary
In this article we have described the IR4QA (Information Retrieval for
Question Answering) project.
Our starting point was to recognize that
Search WWH ::
Custom Search