Database Reference
In-Depth Information
because we are now obliged to screen the results using expensive forward
index accesses.
For the first part, we assume that the time spent scanning posting of the
atype a and intersecting them with selector postings takes time proportional
to corpusCount ( a ), independent of what the specific selectors are. This is
confirmed by Figure 10.23.
Histogram of Multimerge Access Time
80
60
40
20
0
0
0.05
0.1
0.15
0.2
0.25
Average Time (msec)
Histogram of Forward Index Access Time
60
45
30
15
0
0
2
4
6
8
Average Time (msec)
FIGURE 10.23 : t scan is suciently concentrated that replacing the
distribution by a constant number is not grossly inaccurate.
The second part depends on the average time t forward it takes to probe the
forward index for one document and do the reachability test, and on k ,the
number of results sought from the pre-generalized query. Like t scan , t forward is
also suciently peaked and centered to use a point estimate ( Figure 10.24 ).
The overall query bloat factor is therefore
t scan corpusCount ( g )+ k t forward
t scan corpusCount ( a )
Now we come to the question of what k should be. If we make the crude
 
Search WWH ::




Custom Search