Incorporating Other Physical Structures - Automated Physical Database Design and Tuning

Database Reference

In-Depth Information

The finest and coarsest granularities give us two points of reference on

performance vs. number of distinct values. The exact shape of this curve

cannot be easily determined but in practice can be modeled as a smooth

logarithmic relationship. We can then perform a curve-fitting procedure

that gives as a formula to obtain, for a given number of distinct values,

the expected benefit of using such granularity for the corresponding

dimension column.

3. Define candidate granularities for each dimension column. Once we have

identified the finest granularity value, 4 to 10 progressively coarser gran-

ularities are usually useful. For instance, while considering a date col-

umn, we can imagine having granularity values on days, months, quar-

ters, and years. Similarly, for an integer column, we can increase the

finest granularity by using an exponential scale (i.e., dividing the corre-

sponding computed column by 2, 4, 8, and so on).

4. The previous steps allows us to (1) identify candidate dimensions, (2)

estimate the benefit of a candidate dimension at the finest granular-

ity, and (3) model the benefit of a candidate dimension at intermediate

granularity values. The search procedure aims to pick the best possi-

ble MDC from such a list of candidate dimensions and granularities.

One way to carry the search involves relying on a weighted randomized

procedure. Specifically, we select several combinations of dimensions at

various granularity levels in proportion to their relative benefit for the

input workload. Each such combination forms a candidate solution. We

then rank all candidate solutions by aggregated benefit. Since the can-

didate dimensions and granularities have been picked randomly, chances

are that the combined number of distinct values (and therefore the over-

head of the MDC) is larger than the storage budget. We then traverse

the list of candidates in decreasing order of benefit and pick the first one

that satisfies the storage budget. To eciently approximate the number

of distinct values of a candidate MDC, we rely on a precomputed sample

of the table and distinct value estimators.

The previous (simplified) technique can be extended in different ways, such

as by improving the granularity model and storage estimates in the presence

of data skew or by adaptively choosing the storage budget on a per-table basis.

More details can be found in the references at the end of this chapter.

9.4 Extensible Physical Design

In this chapter (and also in Chapter 8) we have discussed several exten-

sions to the original physical design problem, which address different physical

structures (e.g., materialized views and partitioning). Specifically, we have

Search WWH ::

Custom Search

Home