Database Reference
In-Depth Information
has a BBC compressed bitmap index, IBM DB2 has the Encoded Vector
Index, IBM Informix products have two versions of bitmap indexes (one for
low-cardinality data and one for high-cardinality data), and Sybase IQ data
warehousing products have two versions of bitmap indexes as well. These
bitmap index implementations are either based on the basic bitmap index or
the bit-sliced index, which are the two best choices among all multicomponent
bitmap indexes. 110
There are a number of research prototypes with numerous bitmap
indexes. 63 , 106 In particular, FastBit is freely available for anyone to use and ex-
tend. We next briefly describe some of the key features of the FastBit software.
FastBit is distributed as C++ source code and can be easily integrated into
a data processing system. On its own, it behaves as a minimalistic data ware-
housing system with column-oriented data organization. Its strongest feature
is a comprehensive set of bitmap indexing functions that include innovative
techniques in all three categories discussed above. For compression, FastBit
offers WAH as well as the option to uncompress some bitmaps. For encoding,
FastBit implements all four theoretically optimal compressed bitmap indexes
in addition to a slew of bitmap encodings proposed in the research literature.
For binning, it offers the unique low-precision binning as well as a large set
of common binning options such as equal-width, equal-weight, and log-scale
binning. Because of the extensive indexing options available, it is a good tool
for conducting research in indexing. In 2007, two PhD theses involving FastBit
software were successfully completed, which demonstrated the usefulness of
FastBit as a research tool. 72 , 85 FastBit has also been successfully used in a drug
screening software, TrixX-BMI, and was shown to speed up virtual screening
by 12 times on average in one case and hundreds of times in another. 80 The
chapter on visualization, Chapter 9, describes another application of using
FastBit for network trac analysis. Later in Section 6.7.3 we will briefly de-
scribe another application of using FastBit in analysis of high-energy physics
data.
6.7 Data Organization and Parallelization
In this section, we briefly review a number of data management systems to
discuss the different aspects of data organization and their impact on query
performance. Since many of the systems are parallel systems, we also touch
on the issue of parallelization. Most of the systems reviewed here do not have
extensive indexing support. We also present a small test comparing one of
these systems against FastBit to demonstrate that indexing could improve the
query performance. Finally, we discuss the Grid Collector as an example of a
smart iterator that combines indexing methods with parallel data processing
to significantly speed up large-scale data analysis.
Search WWH ::




Custom Search