Database Reference
In-Depth Information
￿ Ecient batch update: As already said, updates are not so critical in OLAP
systems, which allows more columns to be indexed. However, the refreshing
time of a data warehouse must be taken into account when designing the
indexing schema. Indeed, the time needed for reconstructing the indexes
after the refreshing extends the downtime of the warehouse.
￿ Sparse data: Typically, only 20% of the cells in a data cube are nonempty.
The indexing schema must thus be able to deal eciently with sparse and
nonsparse data.
To cope with these requirements, two kinds of indexes are commonly used
in data warehouse systems: bitmap indexes and join indexes. We study these
indexes next.
7.5.1 Bitmap Indexes
Consider the table Product in Fig. 7.10 a. For clarity, we assume a simplified
example with only six products. We show next how to build a bitmap index
on attributes QuantityPerUnit and UnitPrice . There are, respectively, four and
five possible values for these attributes in table Product .Wecreateabit
vector of length 6 (the number of rows in Product ) for each possible attribute
value, as shown in Fig. 7.10 b,c. In a position i of vector j , there is a '1' if the
product in row i has the value in the label of column j , and a '0' otherwise.
For example, in the first row of the table in Fig. 7.10 b, there is a '1' in the
vector with label 25, indicating that the corresponding product ( p1 )hasa
value 25 in attribute QuantityPerUnit . Note that we have included the product
key in the first column of the bitmap index to facilitate the reading, although
this column is not part of the index.
Now, assume the query “Products with unit price equal to 75.” A query
processor will just need to know that there is a bitmap index over UnitPrice in
Product and look for the bit vector with a value of 75. The vector positions
where a '1' is found indicate the positions of the records that satisfy the
query, in this case, the third row in the table.
For queries involving a search range, the process is a little bit more
involved. Consider the query “Products having between 45 and 55 pieces
per unit, and with a unit price between 100 and 200.” To compute this
query, we first look for the index over QuantityPerUnit and the bit vectors
with labels between 45 and 55. There are two such vectors, with labels 45
and 50. The products having between 45 and 55 pieces per unit are the ones
corresponding to an OR operation between these vectors. Then, we look for
the index over UnitPrice and the bit vectors with labels between 100 and 200.
There are three such vectors, with labels 100, 110, and 120. The products
having unit price between 100 and 200 are, again, the ones corresponding to
an OR operation between these vectors. We obtain the two vectors labeled
OR1 and OR2 in Fig. 7.11 a,b, respectively. Finally, an AND between these
Search WWH ::




Custom Search