Database Reference
In-Depth Information
large index. Since the index under the base configuration was not fragmented,
the execution engine could go through the index using sequential I/O, which
is very fast. In contrast, under the recommended configuration the execution
engine had to do one random I/O every five disk blocks on average due to
fragmentation in the index, which resulted in a larger execution time overall.
Current DBMSs expose mechanisms to defragment indexes, which compact
data in pages to reduce internal fragmentation and reorder pages themselves
to reduce external fragmentation. Index defragmentation, however, is a heavy-
weight operation that can result in significant I/O cost and must therefore be
invoked with care. Since it is usually not feasible to defragment all indexes
within a typical batch window, database applications (DBAs) today use rules
of thumb to select which indexes to defragment.
The granularity at which index defragmentation is supported is the full B + -
tree , which can be very expensive for large indexes. In many cases, however,
the impact of fragmentation may not be uniformly distributed throughout
the B + -tree . For example, consider a clustered index on the date column of
a large fact table that stores order information. As new data are inserted
into the fact table, the B + -tree gets fragmented. Often, many of the queries
reference only recent data (e.g., last month or last quarter). Also, the benefit
from defragmenting the index is effective only for ranges scanned by queries.
Thus, the ability to perform index defragmentation for a specific range on the
key column of a B + -tree (e.g., date > 06/30/2010 ) can provide most of the
benefits of defragmenting the full index at a lower cost. Additionally, while
data-driven approaches to index defragmentation are easy to understand and
implement, a purely data-driven approach can suggest defragmenting indexes
that have little or no impact on query performance. This is because they ignore
potentially valuable workload information (i.e., information about queries that
scan the index).
Using workload information can be crucial in large data warehouses consist-
ing of hundreds of indexes, which is typical in enterprise applications. While
leveraging workload information can be important, a couple of key challenges
make it dicult for DBAs to exploit workload information for index defrag-
mentation. First, it is dicult to estimate the impact of defragmenting an
index on the I/O performance of a query that scans that index. Such what-if
analysis of the impact of defragmentation on query performance is an essential
component to enable a workload-driven approach to index defragmentation.
Second, even if the what-if analysis functionality is available, selecting which
indexes to defragment for large databases and workloads is in general a non-
trivial task.
12.2.2 Compression
Data compression is commonly used in modern DBMSs for different reasons,
including reducing storage/archival costs (which is particularly important for
Search WWH ::




Custom Search