Database Reference
In-Depth Information
The finest and coarsest granularities give us two points of reference on
performance vs. number of distinct values. The exact shape of this curve
cannot be easily determined but in practice can be modeled as a smooth
logarithmic relationship. We can then perform a curve-fitting procedure
that gives as a formula to obtain, for a given number of distinct values,
the expected benefit of using such granularity for the corresponding
dimension column.
3. Define candidate granularities for each dimension column. Once we have
identified the finest granularity value, 4 to 10 progressively coarser gran-
ularities are usually useful. For instance, while considering a date col-
umn, we can imagine having granularity values on days, months, quar-
ters, and years. Similarly, for an integer column, we can increase the
finest granularity by using an exponential scale (i.e., dividing the corre-
sponding computed column by 2, 4, 8, and so on).
4. The previous steps allows us to (1) identify candidate dimensions, (2)
estimate the benefit of a candidate dimension at the finest granular-
ity, and (3) model the benefit of a candidate dimension at intermediate
granularity values. The search procedure aims to pick the best possi-
ble MDC from such a list of candidate dimensions and granularities.
One way to carry the search involves relying on a weighted randomized
procedure. Specifically, we select several combinations of dimensions at
various granularity levels in proportion to their relative benefit for the
input workload. Each such combination forms a candidate solution. We
then rank all candidate solutions by aggregated benefit. Since the can-
didate dimensions and granularities have been picked randomly, chances
are that the combined number of distinct values (and therefore the over-
head of the MDC) is larger than the storage budget. We then traverse
the list of candidates in decreasing order of benefit and pick the first one
that satisfies the storage budget. To eciently approximate the number
of distinct values of a candidate MDC, we rely on a precomputed sample
of the table and distinct value estimators.
The previous (simplified) technique can be extended in different ways, such
as by improving the granularity model and storage estimates in the presence
of data skew or by adaptively choosing the storage budget on a per-table basis.
More details can be found in the references at the end of this chapter.
9.4 Extensible Physical Design
In this chapter (and also in Chapter 8) we have discussed several exten-
sions to the original physical design problem, which address different physical
structures (e.g., materialized views and partitioning). Specifically, we have
Search WWH ::




Custom Search