New Challenges in Physical Database Design - Automated Physical Database Design and Tuning

Database Reference

In-Depth Information

large data warehouses), improving query workload performance by reducing

the I/O costs, and reducing manageability costs by decreasing the time and

storage needed by applications like backup/restore.

While data compression yields significant benefits in the form of reduced

storage costs and reduced I/O, there is a substantial central processing unit

(CPU) cost to be paid in decompressing the data during query processing.

Thus, the decision on when to use compression needs to be made carefully.

Given that compression increases the space of physical design options, there

is a natural motivation to extend automated physical design tools to handle

compression. In order to meet the storage bound as well as to reason with

I/O costs during query execution, it is necessary to perform a quantitative

analysis of the effects of compression. Specifically, given an index, we need to

understand how much space will be saved by compressing it and how the work-

load performance is impacted by compressing it. One of the key challenges in

answering these questions is to estimate the size of an index if it were to

be compressed. With that functionality, we can reuse the what-if approach of

Chapter 5 by appropriately modifying the hypothetical index metadata. Since

the space of physical design options is large, it is important to be able to per-

form this estimation accurately and eciently. The naıve method of actually

building and compressing the index in order to estimate its size is obviously

not feasible. Thus, an important challenge to incorporate compression as an-

other physical design dimension is to accurately estimate the compressed size

of an index without incurring the cost of actually compressing it. This problem

is challenging because the size of the compressed index significantly depends

on both the data distribution and the specific compression algorithm. This

is in contrast to the size estimation of an uncompressed index, which can be

derived from the schema and table cardinality.

12.2.3 Layout

In addition to the question of which indexes to materialize, an important

decision is to determine how the resulting indexes would be assigned to the

available disk drives (i.e., designing the database layout ). A good layout should

result in balanced loads in the storage subsystem, or otherwise the most loaded

disk would quickly become the bottleneck while others might be underutilized.

Traditionally, DBMSs rely on solutions that spread out tables and indexes uni-

formly over all available disk drives (e.g., by relying on redundant arrays of in-

expensive disks, or RAID configurations). Such alternatives are relatively easy

to manage because DBAs do not need to be concerned about specific object

placement. At the same time, when multiple large indexes are accessed con-

currently during execution (e.g., in complex decision support queries), these

solutions may perform suboptimally. The reason is interference between re-

quests for different objects that are laid on the same disk. For instance, if

two indexes are laid out in the same disk and need to be read sequentially

Search WWH ::

Custom Search

Home