Incorporating Other Physical Structures - Automated Physical Database Design and Tuning

Database Reference

In-Depth Information

even a very conservative cost fraction of 2% results in a large fraction

of column subsets being pruned (because many column subsets are not

referenced often enough or are mentioned only in cheap queries). To

further decrease the set of candidates we rank column subsets by some

effectiveness metric. A possible definition of effectiveness for a column

subset is given by the VPC metric, which captures the fraction of the

scanned data in a partition that would be useful in answering queries

( VPC is short for vertical partitioning confidence , and it is discussed

in detail in the references provided in Section 9.6). We then take the

top- k highest-ranked candidate subsets for each table in the workload

and can generate a candidate vertical partitioning. Specifically, the can-

didate vertical partitioning based on column subset C contains C and

the table key as one vertical partition and the remaining columns in

the table (plus the key) as another vertical partition. Every vertical

partitioning candidate therefore consists of exactly two partitions. We

also consider the case where the table is not vertically partitioned. We

next generate nonpartitioned candidate indexes and views as described

in Chapters 4 and 8 and additionally consider horizontally partitioning

each such candidate index and view. For range partitioning, we use the

specific values from query range predicates as boundary points. For hash

partitioning, we select number of partitions such that it is a multiple of

the number of processors and each partition fits into memory. Finally,

for each vertical partitioning candidate, we consider candidate indexes

and views on such vertical partition candidates if they are well defined

(i.e., if the set of columns of the index or view belongs to a single vertical

partition).

Merging. To explore the space of merged structures, we iterate over the

given set of candidates similarly to what we do in Section 8.4.1 in the

context of materialized views. At each iteration, we generate all candi-

dates that can be obtained by merging a pair of current candidates. We

then pick the best merged structures, replace their parent candidates,

and repeat this process. Thus, we return the maximal merged struc-

tures that can be obtained by repeatedly merging pairs of structures,

which are added to the original set of candidates. We next describe

how to merge a pair of physical design structures in the presence of

vertical and horizontal partitioning. First, we generate interesting ver-

tical partitioning candidates by merging vertical partitions that are the

output of the candidate selection step. Merging two partitioning candi-

dates involves manipulating two subsets of columns. There are different

ways to generate merged partitioning candidates, which involve per-

forming unions and intersections of the column subsets of the parent

candidates. Then, for each single vertical partition (including the new

merged ones), we merge all indexes and materialized views that are rele-

vant for that vertical partition while taking horizontal partitioning into

Automated Physical Database Design and Tuning

Search WWH ::

Custom Search

Home