Database Reference
In-Depth Information
where indexes can dramatically speed up the query processing. For exam-
ple, for interactive exploration, one might select a few million records from
a dataset with billions of records. Going through the whole dataset to find a
few million is likely to take longer than using an effective index. Another ex-
ample is in query estimation. Often, before users commit to evaluate subsets
of data records, they may want to know the size of the result and the pos-
sible processing time. This task is usually better accomplished with indexes.
Additionally, indexes may provide a faster way to gather some statistics. To
illustrate the usefulness of an index in accomplishing such tasks, we next give
a specific example of answering count queries from the Set Query Benchmark.
The Set Query Benchmark is a benchmark designed to evaluate the perfor-
mance of OLAP-type applications. 63 , 67 The test data contains 12 columns of
uniform random numbers plus a sequence number that serves as a row identi-
fier. Our test uses queries of the form “ Select count(*) From Bench Where ... ,”
which we call count queries. 63 We use a test dataset with 100 million rows in-
stead of the original specification of a million rows. In the test dataset, we also
adjust the column cardinality of the two random columns to be 25 million and
50 million instead of 250,000 and 500,000 as the benchmark specification indi-
cated. We note that such uniform random values with extremely high column
cardinalities are the worst type of data for the compressed bitmap indexes.
The table in Figure 6.19 contains the names of these count queries, Q1, Q2A,
Q2B, and so on. Each of these queries has a number of different instances that
involve different columns or different query conditions. Figure 6.19a shows the
total time to complete all instances of a query, and Figure 6.19b shows the
query response time for each instance of Q1.
The query response times are gathered from two systems: one with a com-
pressed bitmap index and the other with compressed columns but without
any index. The bitmap index is from FastBit and the indexless system is a
commercial product that reorders and compresses the base data. This index-
less system is advertised as the fastest data processing system. During our
testing, we consulted with the vendor to obtain the best ordering and com-
pression options available in early 2007. The two systems are run on the same
computer with a 2.8 GHz Intel Pentium CPU and a small hardware RAID
that is capable of supporting about 60 MB/s throughput.
The time results in Figure 6.19 clearly indicate that indexes are useful for
these queries. Overall, one can answer all the queries about 11 times faster
using bitmap indexes than using the compressed base data. Figure 6.19b shows
some performance details that help to explain the observed differences. The
horizontal axis in Figure 6.19b is the column cardinality. The best reordering
strategy that minimizes the overall query response time is to order the lowest
cardinality column first, then the next lowest cardinality column, and so on.
More specifically, the column with cardinality 2 (the lowest cardinality) is
completely sorted; when the lowest cardinality column values are the same,
the rows are ordered according to the column with cardinality 4; when the first
two columns have the same value, the rows are ordered according to the next
Search WWH ::




Custom Search