Incorporating Other Physical Structures - Automated Physical Database Design and Tuning

Database Reference

In-Depth Information

The set of subcubes to materialize should always include the top subcube

(see Figure 9.2), because there is no other subcube that can be used to answer

the corresponding query without going to the raw data. Even under the sim-

plifying assumptions previously discussed, the problem is shown to be NP-

complete from a reduction from set cover. Interestingly enough, however, there

is a greedy solution for this problem with good quality guarantees. Before in-

troducing the greedy algorithm, we define the benefit of a subcube relative

to a set of subcubes. Specifically, after selecting some set S of subcubes, the

benefit of subcube s relative to S is defined as B

v s

(

s

,

S

) =

B v , where

u

(since the top subcube is in S , there must be at least one such subcube in

S ). That is, we compute the benefit value B of a subcube s by considering

how it can improve the cost of evaluating subcubes, including itself. Using

this definition, Figure 9.3 shows the greedy algorithm to select a set of k sub-

cubes to materialize, which selects, at each step, the subcube with the highest

benefit with respect to the set of subcubes that were already selected. It can

be shown that the greedy algorithm never has a benefit below 63% of that

of the optimal solution. We next discuss some extensions to the basic greedy

algorithm.

B v = max ( 0 , |

s

|−|

u

| ) , and u is the subcube of least cost in S such that v

Workload. We assumed that all queries in the workload are equally likely.

In general, we might want to associate a weight with each query propor-

tional to the frequency that the query is evaluated. We can extend the

greedy algorithm by multiplying each benefit value by the corresponding

weight, which results in the same performance guarantee.

Storage bound. The greedy algorithm optimizes the workload cost under

a constraint of a given number of subcubes. In reality, subcube sizes can

vary by orders of magnitude (see Figure 9.2), and therefore we might

want to optimize the cost of the workload for a given storage constraint

instead. In this case, we need to consider benefit values per unit of

storage in line 3 of Figure 9.3. In that way, if the largest aggregate

view uses a fraction f of the storage constraint, the resulting greedy

algorithm is guaranteed to result in a benefit of at least

(

0

.

63

−

f

)

.

greedySC ( T :Execution Plan)

1S= { top sub-cube }

2

for i=1 to kdo

s = sub-cube not in S that maximizes B ( s,S )

3

S=S ∪{ s }

4

return S

5

FIGURE 9.3

Greedy algorithm to recommend subcubes to materialize.

Automated Physical Database Design and Tuning

Search WWH ::

Custom Search

Home