Incorporating Other Physical Structures - Automated Physical Database Design and Tuning

Database Reference

In-Depth Information

introduced to take advantage of the block-oriented clustering and indexing

techniques. Consider a query that calculates the total sales of product with

ProductId = 100 over all dates and countries:

SELECT SUM(Price)

FROM Sales

WHERE ProductId = 100

This query can be eciently answered as follows. First, we process the dimen-

sion index on ProductId to locate the slice (i.e., block ids) that corresponds to

tuples satisfying ProductId = 100 . Then, we process all records in the block.

In this specific example, we can guarantee that each record in an accessed

block satisfies the predicate on ProductId . In general, due to computed dimen-

sion columns such as Date in the example (or additional predicates on nondi-

mension columns), we might need to do a postfiltering step to retrieve the valid

tuples. This operation is most effective when all or most tuples in a block need

to be processed for a given query. Note that the same MDC structure can e-

ciently answer a similar query with a predicate on Country , without the need

to reorganize the data differently. A clustered index on ProductId might be

comparable to the MDC in the figure for the query with a filter in ProductId

but would result in a full sequential scan for a predicate on Country .

Consider a variation of the previous query, with an additional predicate on

Country :

SELECT SUM(Price)

FROM Sales

WHERE ProductId = 100 AND Country = 'Canada'

In this case, we can consider traditional index intersection plans that op-

erate on blocks rather than tuples. Thus, we can inspect the dimension

index on ProductId obtaining the block ids that contain all tuples with

ProductId = 100 . We then do the same with the dimension index on Country

for the block ids containing tuples that satisfy Country = Canada . We finally

intersect those block ids and explore only the resulting ones that satisfy both

predicates. Since dimension indexes are typically much smaller than the tuple-

based ones, the intersection can be done eciently by using bit-vector filtering

techniques. In addition to index intersection, other, more sophisticated tech-

niques (e.g., index union) are also possible.

9.3.3 Recommending MDCs

As we illustrated in the previous section, the performance of multidimensional

queries is often improved through data clustering, which can significantly re-

duce input/output (I/O) cost. Yet the choice of clustering dimensions and the

granularity of the clustering (i.e., which computed column to use) are nontriv-

ial choices, and a bad design might even result in performance degradation.

We first note that MDCs generally result in an increase in storage. The reason

Search WWH ::

Custom Search

Home