Database Reference
In-Depth Information
introduced to take advantage of the block-oriented clustering and indexing
techniques. Consider a query that calculates the total sales of product with
ProductId = 100 over all dates and countries:
SELECT SUM(Price)
FROM Sales
WHERE ProductId = 100
This query can be eciently answered as follows. First, we process the dimen-
sion index on ProductId to locate the slice (i.e., block ids) that corresponds to
tuples satisfying ProductId = 100 . Then, we process all records in the block.
In this specific example, we can guarantee that each record in an accessed
block satisfies the predicate on ProductId . In general, due to computed dimen-
sion columns such as Date in the example (or additional predicates on nondi-
mension columns), we might need to do a postfiltering step to retrieve the valid
tuples. This operation is most effective when all or most tuples in a block need
to be processed for a given query. Note that the same MDC structure can e-
ciently answer a similar query with a predicate on Country , without the need
to reorganize the data differently. A clustered index on ProductId might be
comparable to the MDC in the figure for the query with a filter in ProductId
but would result in a full sequential scan for a predicate on Country .
Consider a variation of the previous query, with an additional predicate on
Country :
SELECT SUM(Price)
FROM Sales
WHERE ProductId = 100 AND Country = 'Canada'
In this case, we can consider traditional index intersection plans that op-
erate on blocks rather than tuples. Thus, we can inspect the dimension
index on ProductId obtaining the block ids that contain all tuples with
ProductId = 100 . We then do the same with the dimension index on Country
for the block ids containing tuples that satisfy Country = Canada . We finally
intersect those block ids and explore only the resulting ones that satisfy both
predicates. Since dimension indexes are typically much smaller than the tuple-
based ones, the intersection can be done eciently by using bit-vector filtering
techniques. In addition to index intersection, other, more sophisticated tech-
niques (e.g., index union) are also possible.
9.3.3 Recommending MDCs
As we illustrated in the previous section, the performance of multidimensional
queries is often improved through data clustering, which can significantly re-
duce input/output (I/O) cost. Yet the choice of clustering dimensions and the
granularity of the clustering (i.e., which computed column to use) are nontriv-
ial choices, and a bad design might even result in performance degradation.
We first note that MDCs generally result in an increase in storage. The reason
Search WWH ::




Custom Search