Database Reference
In-Depth Information
partitioning scheme with the highest benefit. We additionally define the
neighbors of a configuration C as all configurations that differ with C in
exactly one partitioning function. The enumeration algorithm proceeds
as follows. We keep a priority queue of unexplored configurations, which
we first populate with the initial configuration already described. Then,
until a time limit is reached, we (1) remove from the priority queue the
highest-ranked configuration, (2) perform a what-if optimization to ob-
tain the true benefit value of the configuration as described in Chapter 5,
(3) keep the current configuration if it is the best so far, and (4) insert
in the priority queue all unexplored configurations that are neighbors
of the current one. While we can use the benefit of a configuration C as
the ranking function in the priority queue, a refined formulation gives
better empirical results. In addition to the benefit of a configuration,
this refined rank of a configuration gives priority to partitions of large
tables and penalizes small incremental changes. Section 9.6 discusses
references that provide additional details on this ranking function as
well as various optimizations to the basic scheme.
Integrated Recommendation of Partitioning
Recommending horizontal partitioning in isolation makes sense in a parallel
DBMS, where the biggest performance overhead results from data movement
across servers. In traditional centralized settings, however, it is much more
important to integrate the recommendation of partitioning (both horizontal
and vertical) with that of other physical structures, like indexes and ma-
terialized views. The reason is that different aspects of physical design can
strongly interact. Experimental evaluation in the literature compares the best
recommendation obtained by two alternative procedures. The first one se-
lects the best (nonpartitioned) indexes and subsequently recommends how to
best horizontally partition them. The second alternative considers indexes and
horizontal partitioning together and returns the best recommendation overall.
Even for simple TPC-H queries, it has been observed that the execution times
under the recommendation produced by the integrated alternative can be sig-
nificantly faster than those under the alternative that recommends physical
structures in isolation. We next describe how we can recommend partitioning,
indexes, and materialized views in an integrated way.
Candidate generation. Generating candidates for vertical partitions is
complex because there are many different ways to divide the columns
of each table. To avoid a combinatorial explosion of candidates, we ap-
ply a pruning technique similar to that in Section 8.4.1 in the context
of materialized views. Specifically, the idea is to consider a subset of
columns C for partitioning only if the fraction of the cost of all queries
in the workload that reference C is above some threshold (and there-
fore C is deemed interesting). It has been shown experimentally that
Search WWH ::

Custom Search