Automated Physical Database Design - Physical Database Design

Databases Reference

In-Depth Information

more slowly! If the wizard was given both the queries and the inserts to analyze it may

have recommended only one or two indexes, perhaps solving the bulk of the issues, for

example, 80 times faster, and incurring a much smaller penalty for the insert processing.

In fact the best tools are designed precisely to evaluate these tradeoffs, but they can only

do so if given a broad view of the workload that the system will encounter.

The need to provide design utilities with a broad view of the workload has some

serious issues of its own. Many real-world workloads include thousands if not hundreds

of thousands of statements executed per day. Some may differ only slightly in their syn-

tax while others may be radically different. This poses a serious problem for the utilities

since the analysis of the workload and the time required to provide recommendations

grow proportionately and sometimes exponentially with the size of the workload, the

number of tables, and the number of predicates and joins, etc. The impact can be dra-

matic, causing the analysis to consume several days of execution time for even a very

modest workload of a few dozen statements.

In an internal test using two different database products, 5 physical design utilities

were compared against the same database data and the same workloads to test the reac-

tion to workload size. The utilities searched for both indexes and materialized views.

The tests used a synthetic workload against a classic MOLAP schema. The fact table

had 8 measures and 16 hierarchical dimensions. In this experiment the design utilities

were initially given 40 statements to analyze, and the workload size was subsequently

increased to 60, 80, 100, and 150 statements. Each time the workload was increased

the design utilities were rerun and the time to generate a recommendation was

recorded. As shown in Figure 12.17 the utility with workload compression showed a

linear increase in recommendation time, scaling from roughly 20 minutes with 40

statements in the workload, to over an hour of execution with a workload of 150 state-

ments. In contrast the utility without workload compression had similar performance

for providing a recommendation with the initial 40-statement workload, but as the

workload size was increased, the recommendation time grew exponentially. With 150

statements, the utility without workload compression required over four days of analy-

sis before returning a result!

There are a number of techniques that vendors use to avoid the explosive growth in

execution time of the wizards. The first is to use an intelligent search algorithm that will

traverse the domain of possible solutions trying to find an optimal or near-optimal solu-

tion without evaluating every possible alternative. Various techniques are used for this,

including random sampling, genetic algorithms, Z-order evaluation, etc. Another key

strategy is to reduce the complexity of the workload being analyzed in a way that will not

significantly reduce the results of the analysis and recommendation. Doing so is called

5 Due to licensing agreements the names of the database products are deliberately omitted. The

tests were courtesy of Scott J. Fadden, IBM Systems & Technology Group, Development GPFS

Performance Engineer.

Search WWH ::

Custom Search

Home