Database Reference
In-Depth Information
WAH gains its speed partly from its simplicity. For long sequences of 0s
or 1s, it uses run-length encoding to represent them, and for relatively short
sequences of mixed 0s and 1s, it represents the bits literally. Hence, it is a
hybrid of two methods. Another key feature that enables it to achieve perfor-
mance is that the compressed data are word aligned. More specifically, WAH-
compressed data contains two types of words: literal words and fill words.
A literal word contains one bit to indicate its type and uses the remaining
bits to store the bitmap literally. On a 32-bit word system, it may use the
most significant bit to indicate the type of the word, and use the remaining
31 bits to store the bitmap. A fill word similarly needs 1 bit to indicate its
type. It uses another bit to indicate whether the bits are all 0s or all 1s,
and the remaining bits are used to store the number of bits in a bitmap it
represents. The number of bits represented by a WAH fill word is always a
multiple of the number of bits stored in a literal word. Therefore, the length
of a fill is stored as this multiple instead of the actual number of bits. For
example, a fill of 62 bits will be recorded as being of length 2 because it is
two times the number of bits that can be stored in a literal word (31). This
explicitly enforces the word-alignment requirement and allows one to easily
figure out how many literal words a fill word corresponds to during a bitwise
logical operation. Another important property is that it allows one to store
any fill in a single fill word as long as the number of bits in a bitmap can be
stored in a word. This is an important property that simplifies the theoretical
analysis of WAH compression. 103 An illustration of WAH compression on a
32-bit machine is shown in Figure 6.16a.
Figure 6.16b shows some examples to illustrate the effects of compression
on overall query response time. In this case the commercial DBMS implemen-
tation of compressed bitmap index (marked as “DBMS bitmap index”) uses
BBC compression, while “FastBit index” uses WAH compression. The query
response times reported are average time values over thousands of ad hoc
range queries that produce the same number of hits. Over the whole range of
different numbers of hits, the WAH compressed indexes answer queries about
14 times faster than the commercial bitmap indexes.
In addition to being ecient in timing measurements, WAH-compressed
basic bitmap index is also theoretically optimal. In the worst case, the query
response time is a linear function of the number of hits according to our
analysis in References 102 and 103. A few of the best B-tree variants have the
same theoretical optimality as the WAH-compressed bitmap index. 24 However,
bitmap indexes are much faster in answering queries that return more than a
handful of hits as illustrated in Figure 6.16. Since the basic bitmap index con-
tains the same information as a typical B-tree, it is possible to switch between
bitmaps and RID lists to always use the more compact representation as sug-
gested in the literature. 65 This is an alternative form of compression that was
found to perform quite well in a comparison with WAH-compressed indexes. 63
The bitmap compression methods are designed to reduce the index sizes,
and they are quite effective at this. Discussions on how each compression
Search WWH ::




Custom Search