Database Reference
In-Depth Information
the storage space required but also the cost of refreshing the summary tables.
On the other hand, it implies that any possible aggregate query will match a
summary table; thus, the cost of answering the query would just be a table
scan which, in addition, will often be of a small size. The “no materialization”
approach is likely to be inecient in most real-world situations. It follows that
a good trade-off between these options can be to materialize only a portion of
the views in the data cube. The main problem in this case is to decide which
views are going to be materialized. Notice that once we decide which are the
views to materialize, we can apply the techniques for cube computation and
maintenance already studied in this chapter. Actually, the problem could be
stated in many ways:
￿ How many views must we materialize to get reasonable performance?
￿ Given a certain amount of storage space, which views should we materialize
in order to minimize the average query cost?
￿ If we can assume an X% performance degradation with respect to a fully
materialized data cube, how much space do we save?
We next explain a classic greedy algorithm that finds, given a cube lattice,
the best set of views to materialize under a certain criterion. Although the
set of views returned by the algorithm may not always be the optimal one,
we have chosen this algorithm as representative of a class of algorithms that
aim at solving the same problem.
The algorithm makes use of a lattice that takes into account two kinds
of dependencies between nodes. The first kind of dependency accounts for
the case in which the attributes of a view are included in those of another
view. For example, in the lattice representing the possible aggregations of the
fact table Sales ( ProductKey , CustomerKey , TimeKey ), there is a dependency
between the node ( ProductKey , CustomerKey )andthenode( ProductKey ),
stating that the latter can be computed from the former since
{
ProductKey
}⊆
{
holds. The second kind of dependency accounts
for hierarchies. For example, given a hierarchy Month
ProductKey , CustomerKey
}
Year ,ifwehavean
aggregation over Month , we can use it to compute the aggregation over Year
without going down to the fact table. Thus, the dependency lattice represents
a relation v i v j between the views such that a view v i can be answered
using v j . For simplicity, and without loss of generalization, in the examples
of this section, we only consider the case in which the attributes of a view
are included in those of another view.
The view selection algorithm is based on calculating the costs of computing
the views in the lattice. A linear cost model with the following characteristics
is assumed:
￿ The cost of answering a view v from a materialized view v m is the number
of rows in v m .
￿ All queries are identical to some view in the dependency lattice.
Search WWH ::




Custom Search