New Challenges in Physical Database Design - Automated Physical Database Design and Tuning

Database Reference

In-Depth Information

large index. Since the index under the base configuration was not fragmented,

the execution engine could go through the index using sequential I/O, which

is very fast. In contrast, under the recommended configuration the execution

engine had to do one random I/O every five disk blocks on average due to

fragmentation in the index, which resulted in a larger execution time overall.

Current DBMSs expose mechanisms to defragment indexes, which compact

data in pages to reduce internal fragmentation and reorder pages themselves

to reduce external fragmentation. Index defragmentation, however, is a heavy-

weight operation that can result in significant I/O cost and must therefore be

invoked with care. Since it is usually not feasible to defragment all indexes

within a typical batch window, database applications (DBAs) today use rules

of thumb to select which indexes to defragment.

The granularity at which index defragmentation is supported is the full B + -

tree , which can be very expensive for large indexes. In many cases, however,

the impact of fragmentation may not be uniformly distributed throughout

the B + -tree . For example, consider a clustered index on the date column of

a large fact table that stores order information. As new data are inserted

into the fact table, the B + -tree gets fragmented. Often, many of the queries

reference only recent data (e.g., last month or last quarter). Also, the benefit

from defragmenting the index is effective only for ranges scanned by queries.

Thus, the ability to perform index defragmentation for a specific range on the

key column of a B + -tree (e.g., date > 06/30/2010 ) can provide most of the

benefits of defragmenting the full index at a lower cost. Additionally, while

data-driven approaches to index defragmentation are easy to understand and

implement, a purely data-driven approach can suggest defragmenting indexes

that have little or no impact on query performance. This is because they ignore

potentially valuable workload information (i.e., information about queries that

scan the index).

Using workload information can be crucial in large data warehouses consist-

ing of hundreds of indexes, which is typical in enterprise applications. While

leveraging workload information can be important, a couple of key challenges

make it dicult for DBAs to exploit workload information for index defrag-

mentation. First, it is dicult to estimate the impact of defragmenting an

index on the I/O performance of a query that scans that index. Such what-if

analysis of the impact of defragmentation on query performance is an essential

component to enable a workload-driven approach to index defragmentation.

Second, even if the what-if analysis functionality is available, selecting which

indexes to defragment for large databases and workloads is in general a non-

trivial task.

12.2.2 Compression

Data compression is commonly used in modern DBMSs for different reasons,

including reducing storage/archival costs (which is particularly important for

Search WWH ::

Custom Search

Home