Database Reference
In-Depth Information
500
400
300
200
100
0
0
5
10
15
20
25
Estimated Bloat
FIGURE 10.25 : Scatter of observed against estimated query bloat.
because 1. IO wait times are non-deterministic because of file-system buffering
and RAID, and 2. To remain practical, our model ignores the effect of
selectors. Similar variability is seen in the Bindings Engine (4, Figure 3,
page 447) as well. In the relational query optimizer literature, join size
estimates (and therefore CPU/IO cost estimates) are often relatively crude
(20) but nevertheless lead to reasonable query plans (14).
Ratio
Count
%
Ratio
Count
%
.5-1
16
11.6
10-20
110
79.7
1-2
78
56.5
20-50
123
89.1
2-5
93
67.3
50-100
128
92.8
6-10
104
75.3
100-200
138
100
FIGURE 10.26 : Histogram of observed-to-estimated bloat ratio for
individual queries with a specific R occupying an estimated 145 MB of atype
index.
For a specific R picked by AtypeSubsetChooser (described next, in
Section 10.4.5) and 138 sample queries where g
= a given R , Figure 10.26
shows the cumulative distribution of the ratio of the observed to estimated
bloat. As can be seen, 68% of the queries have observed bloats less than five
times the estimated bloats, and 75% are within 10
. The fit of observed
to estimated bloats is reasonable for most queries, with only a few queries
exhibiting a large difference between the two.
×
10.4.5 Choosing an Atype Subset
We thus have a bi-criteria optimization problem:
given the corpus,
query workload W and atype set A ,choose R
A so as to minimize
r∈R corpusCount ( r ) and also minimize the expected query bloat
queryProb W ( a ) queryBloat ( a, R )
(10.11)
a∈A
 
 
Search WWH ::




Custom Search