Database Reference
In-Depth Information
The set of subcubes to materialize should always include the top subcube
(see Figure 9.2), because there is no other subcube that can be used to answer
the corresponding query without going to the raw data. Even under the sim-
plifying assumptions previously discussed, the problem is shown to be NP-
complete from a reduction from set cover. Interestingly enough, however, there
is a greedy solution for this problem with good quality guarantees. Before in-
troducing the greedy algorithm, we define the benefit of a subcube relative
to a set of subcubes. Specifically, after selecting some set S of subcubes, the
benefit of subcube s relative to S is defined as B
v s
(
s
,
S
) =
B v , where
u
(since the top subcube is in S , there must be at least one such subcube in
S ). That is, we compute the benefit value B of a subcube s by considering
how it can improve the cost of evaluating subcubes, including itself. Using
this definition, Figure 9.3 shows the greedy algorithm to select a set of k sub-
cubes to materialize, which selects, at each step, the subcube with the highest
benefit with respect to the set of subcubes that were already selected. It can
be shown that the greedy algorithm never has a benefit below 63% of that
of the optimal solution. We next discuss some extensions to the basic greedy
algorithm.
B v = max ( 0 , |
s
|−|
u
| ) , and u is the subcube of least cost in S such that v
Workload. We assumed that all queries in the workload are equally likely.
In general, we might want to associate a weight with each query propor-
tional to the frequency that the query is evaluated. We can extend the
greedy algorithm by multiplying each benefit value by the corresponding
weight, which results in the same performance guarantee.
Storage bound. The greedy algorithm optimizes the workload cost under
a constraint of a given number of subcubes. In reality, subcube sizes can
vary by orders of magnitude (see Figure 9.2), and therefore we might
want to optimize the cost of the workload for a given storage constraint
instead. In this case, we need to consider benefit values per unit of
storage in line 3 of Figure 9.3. In that way, if the largest aggregate
view uses a fraction f of the storage constraint, the resulting greedy
algorithm is guaranteed to result in a benefit of at least
(
0
.
63
f
)
.
greedySC ( T :Execution Plan)
1S= { top sub-cube }
2
for i=1 to kdo
s = sub-cube not in S that maximizes B ( s,S )
3
S=S ∪{ s }
4
return S
5
FIGURE 9.3
Greedy algorithm to recommend subcubes to materialize.
Search WWH ::




Custom Search