Incorporating Other Physical Structures - Automated Physical Database Design and Tuning

Database Reference

In-Depth Information

be done after the partitioning scheme has been decided. We now describe the

steps involved in recommending hash-based partitioning in a parallel DBMS.

Candidate selection. We identify all candidate columns in the workload

that are part of equality predicates or group-by clauses. As explained in

Section 4.1, candidate selection can be done by either parsing the input

queries or the output execution plans or instrumenting the optimizer. In

addition to all candidate partitioning schemes identified in this manner,

we also consider the option of replicating some tables across all servers,

which can reduce communication overhead. Since replication incurs stor-

age overhead, we consider replicating only tables whose sizes are below a

threshold. Some optimizers extend the notion of interesting orders (see

Section 2.3.1) to also generate interesting partitioning columns. Subop-

timal plans that can exploit some interesting partitioning column are

kept during optimization because they can be part of the optimal plan

overall. Then, the optimizer is instrumented to simulate the availability

of all partitioning and replication schemes simultaneously, so that after

optimization we can obtain the optimal partitioning strategy for a given

query. ∗

Merging. Consider a table T , and suppose that the best candidate par-

titioning for query q 1 is over T

(

a

,

b

)

and that the best candidate

partitioning for query q 2 is over T

(

a

,

c

)

. In this case, a hash-based par-

could be the best partitioning overall, as it benefits

both queries. In general, merging generates new partitioning candidates

with columns that are a common subset of two or more candidates

identified in the previous step. To avoid skewed distributions, we do not

attempt to merge partitioning candidates if the resulting number of key

values is under a given threshold.

Enumeration. Before enumeration starts, we compute a benefit value for

each candidate partitioning. The benefit of a candidate partitioning P

is obtained as the difference in cost, for all queries that picked P as a

candidate partitioning, between the configuration that can leverage all

possible partitioning candidates (including P ) and the one for which

no partitioning is available. Merged partitioning candidates inherit the

sum of benefits of their respective parents. Note that this benefit value

is just an approximation of the true reduction in cost for a given par-

titioning candidate, since it assigns the benefit of the whole query to

all useful partitioning candidates equally. Using benefit values, we de-

fine the initial configuration as the one that uses, for each table, the

titioning over T

(

a

)

∗ We assume that the same table is not repeated in the query, and therefore the optimal plan

contains a single partitioning scheme for each table. In the general case the technique needs to

be extended, as discussed in Section 9.6.

Automated Physical Database Design and Tuning

Search WWH ::

Custom Search

Home