Text Search-Enhanced with Types and Entities - Text Mining: Classification, Clustering, and Applications - page 275

Database Reference

In-Depth Information

1.00E-15

1.00E-06

1.00E-03

1.00E-01

1.E+14

1.E+12

1.E+10

1.E+08

1.E+06

1.E+04

1.E+02

1.E+00

0.0E+0

5.0E+8

1.0E+9

1.5E+9

2.0E+9

Estimated Index Size

1.E+06

1.E+05

1.E+04

1.E+03

1.E+02

1.E+01

1.E+00

0.0E+0

5.0E+8

1.0E+9

1.5E+9

2.0E+9

Estimated Index Size

FIGURE 10.28 (SEE COLOR INSERT FOLLOWING PAGE 130.) :

Estimated space-time tradeoffs produced by AtypeSubsetChooser .The

y-axis uses a log scale.

Note that the curve for =10 − 3

(suggested by

Figure 10.19 ) has the lowest average bloat.

10.4.6.3

Query execution dynamics

Figure 10.31 shows the average time taken per query, for various R swith

increasing index sizes, broken down into Lucene scan+merge time taken if

R = A (“FineTime”), Lucene scan+merge time using a generalized g if R

⊂

A (“PreTime”) and the post-filtering time (“PostTime”). As can be seen,

there are regimes where scan time dominates and others where filtering time

dominates. This highlights why the choice of a good R is a tricky operation:

we cannot assume cost estimates that are any simpler.

10.5 Conclusion

10.5.1 Summary

In this article we have described the IR4QA (Information Retrieval for

Question Answering) project.

Our starting point was to recognize that

Next Page

Text Mining: Classification, Clustering, and Applications

Search WWH ::

Custom Search

Home