Database Reference
In-Depth Information
1.00E-15
1.00E-06
1.00E-03
1.00E-01
1.E+14
1.E+12
1.E+10
1.E+08
1.E+06
1.E+04
1.E+02
1.E+00
0.0E+0
5.0E+8
1.0E+9
1.5E+9
2.0E+9
Estimated Index Size
1.E+06
1.E+05
1.E+04
1.E+03
1.E+02
1.E+01
1.E+00
0.0E+0
5.0E+8
1.0E+9
1.5E+9
2.0E+9
Estimated Index Size
FIGURE 10.28 (SEE COLOR INSERT FOLLOWING PAGE 130.) :
Estimated space-time tradeoffs produced by AtypeSubsetChooser .The
y-axis uses a log scale.
Note that the curve for =10 3
(suggested by
Figure 10.19 ) has the lowest average bloat.
10.4.6.3
Query execution dynamics
Figure 10.31 shows the average time taken per query, for various R swith
increasing index sizes, broken down into Lucene scan+merge time taken if
R = A (“FineTime”), Lucene scan+merge time using a generalized g if R
A (“PreTime”) and the post-filtering time (“PostTime”). As can be seen,
there are regimes where scan time dominates and others where filtering time
dominates. This highlights why the choice of a good R is a tricky operation:
we cannot assume cost estimates that are any simpler.
10.5 Conclusion
10.5.1 Summary
In this article we have described the IR4QA (Information Retrieval for
Question Answering) project.
Our starting point was to recognize that
 
 
Search WWH ::




Custom Search