Range Partitioning - Physical Database Design

Databases Reference

In-Depth Information

7.8 Range Partitioning and Clustering Indexes

Clustering indexes dramatically improve the efficiency of scans by reducing the I/O

required on the base table by ensuring that records with similar key data are located near

each other within the table storage. As described earlier, range partitioning implicitly

clusters data according to the range partition. However, the design of range partitioning

is often driven predominantly by administration needs (utility isolation, roll-in, and

roll-out), while clustering design is often driven by performance goals. The goals can

therefore conflict. Optimal scans are achieved when clustering index design matches the

partitioning key, as shown in the right-hand picture in Figure 7.7.

However, even when the clustering design and the range partition design do not

coincide, the system can still perform well, with a number of advantages (middle

image):

•

Within each partition, rows are in key order. Since each range partition repre-

sents its own storage object in the physical implementation within the database,

access to the storage object remains clustered, and good I/O characteristics are

achieved.

•

During the scan operation, rows will be retrieved from a set of n pages in the

buffer pool (or disk) where n is the number of partitions. Contrast this with a

scan of a clustered index in a nonpartitioned table, where all reads come from

the same page until it is consumed and clearly the range partitioned design is

not as attractive. However, contrasting with the unclustered case, the I/O sav-

ings are enormous. To summarize this, consider a scan with 200 qualifying

rows, over 10 range partitions (for simplicity we'll assume an even distribution),

and we'll assume 50 records per page, as shown in Table 7.1.

Notice in Table 7.1 that while mismatched range partitioning and clustering is

clearly not as efficient as matched range partitioning and clustering, the I/O benefit is

still very significant, twice as efficient with I/O compared to range partitioning on the

scan column, and 20 times more efficient than the baseline.

7.9 The Full Gestalt: Composite Range and Hash

Partitioning with Multidimensional Clustering

Figure 7.8 illustrates the combination of the various slicing and dicing techniques we

have discussed in the topic. A table T1 is hash partitioned across three servers-server 1,

server 2, and server 3—providing the scale-out benefits of shared-nothing partitioning

as described in Chapter 6. The data for the table is then range partitioned by date

according to month, at each partition on the shared-nothing architecture (i.e., at each

Search WWH ::

Custom Search

Home