Database Reference
In-Depth Information
values p2 , p4 ,and p6 . We then access the bitmap vectors with these labels
in Fig. 7.13 f, thus performing a join between Product (Fig. 7.13 a) and Sales .
Only the vectors labeled p2 and p4 match the search since there is no fact
record for product p6 . The third, fourth, and sixth rows in the fact table are
the answer since they are the only ones with a '1' in the corresponding vectors
in Fig. 7.13 f. We then obtain the key values for the CustomerKey ( c2 and c3 )
using the bitmap index in Fig. 7.13 e. With these values we search in the B + -
tree index over the keys in tables Product and Customer to find the names of
the products and the customer satisfying the query condition. Note that this
performs the join between the dimensions and the fact table. As we can see
in Figs. 7.10 aand 7.13 c, the records correspond to the names cust2 , cust3 ,
prod2 ,and prod4 , respectively. Finally, the query answer is ( cust2 , prod2 , 200 )
and ( cust3 , prod4 , 100 ).
Note that the last join with Customer would not be needed if the query
would have been of the following form:
SELECT S.CustomerKey, P.ProductKey, SUM(SalesAmount)
FROM Sales S, Product P
WHERE S.ProductKey = P.ProductKey AND P.Discontinued = ' Yes '
GROUP BY S.CustomerKey, P.ProductKey
The query above only mentions attributes in the fact table Sales .Thus,the
only join that needs to be performed is the one between Product and Sales .
We illustrate now the evaluation of star queries using bitmap join indexes.
We have seen that the main idea is to create a bitmap index over a fact
table using an attribute belonging to a dimension table, precomputing the
join between both tables and building a bitmap index over the latter.
Figure 7.13 g shows the bitmap join index between Sales and Product
over the attribute Discontinued . Finding the facts corresponding to sales
of discontinued products, as required by the query under study, is now
straightforward: we just need to find the vector labeled ' Yes ', and look for the
bits set to '1'. During query evaluation, this avoids the first step described in
the previous section, when evaluating the query with bitmap indexes. This
is done at the expense of the cost of (off-line) precomputation.
Note that this strategy can reduce dramatically the evaluation cost if in
the SELECT clause there are no dimension attributes, and thus, we do not
need to join back with the dimensions using the B + -tree as explained above.
Thus, the answer for the alternative query above would just require a simple
scan of the Sales table, in the worst case.
7.7 Data Warehouse Partitioning
In a database, partitioning or fragmentation divides a table into smaller
data sets (each one called a partition) to better support the management
and processing of very large volumes of data. Partitioning can be applied
to tables as well as to indexes. Further, a partitioned index can be
Search WWH ::




Custom Search