Database Reference
In-Depth Information
20
19
17
13
10
5
0123456789
(d) Max-diff histogram.
10 11 12 13 14 15 16 17 18 19
19
18
17
11
10
9
0123456789
(e) End-biased histogram.
10 11 12 13 14 15 16 17 18 19
FIGURE 2.5
(Continued).
In absence of multidimensional statistics, we also assume that predicates in
different columns are independent. Using these assumptions, there are simple
procedures to estimate the number of output tuples for each operator. For
instance, to estimate the cardinality of single-table range predicates we use
interpolation over the histogram buckets. Join estimation requires that we
combine information of two histograms, and involves aligning buckets and
relying on simplifying assumptions on data distributions. Group-by queries
leverage the number of distinct values, stored either globally at the column
level or inside each histogram bucket.
2.3 Enumeration Strategy
An enumeration algorithm must pick an ecient execution plan for an input
query by effectively exploring the search space. A software engineering consid-
eration while designing an enumeration algorithm is to allow it to gracefully
adapt to changes in the search space or cost model. Optimization architec-
tures built with this paradigm are called extensible optimizers . Building an
extensible optimize involves not just designing a better enumeration algorithm
but also providing an infrastructure for evolution of the optimizer design. In
this section we focus on three representative examples of such extensible opti-
mizers. We first discuss System-R, an elegant approach that helped fuel much
of the subsequent work in optimization. Then, we briefly review Starburst,
Search WWH ::




Custom Search