Databases Reference
In-Depth Information
7.8 Range Partitioning and Clustering Indexes
Clustering indexes dramatically improve the efficiency of scans by reducing the I/O
required on the base table by ensuring that records with similar key data are located near
each other within the table storage. As described earlier, range partitioning implicitly
clusters data according to the range partition. However, the design of range partitioning
is often driven predominantly by administration needs (utility isolation, roll-in, and
roll-out), while clustering design is often driven by performance goals. The goals can
therefore conflict. Optimal scans are achieved when clustering index design matches the
partitioning key, as shown in the right-hand picture in Figure 7.7.
However, even when the clustering design and the range partition design do not
coincide, the system can still perform well, with a number of advantages (middle
image):
Within each partition, rows are in key order. Since each range partition repre-
sents its own storage object in the physical implementation within the database,
access to the storage object remains clustered, and good I/O characteristics are
achieved.
During the scan operation, rows will be retrieved from a set of n pages in the
buffer pool (or disk) where n is the number of partitions. Contrast this with a
scan of a clustered index in a nonpartitioned table, where all reads come from
the same page until it is consumed and clearly the range partitioned design is
not as attractive. However, contrasting with the unclustered case, the I/O sav-
ings are enormous. To summarize this, consider a scan with 200 qualifying
rows, over 10 range partitions (for simplicity we'll assume an even distribution),
and we'll assume 50 records per page, as shown in Table 7.1.
Notice in Table 7.1 that while mismatched range partitioning and clustering is
clearly not as efficient as matched range partitioning and clustering, the I/O benefit is
still very significant, twice as efficient with I/O compared to range partitioning on the
scan column, and 20 times more efficient than the baseline.
7.9 The Full Gestalt: Composite Range and Hash
Partitioning with Multidimensional Clustering
Figure 7.8 illustrates the combination of the various slicing and dicing techniques we
have discussed in the topic. A table T1 is hash partitioned across three servers-server 1,
server 2, and server 3—providing the scale-out benefits of shared-nothing partitioning
as described in Chapter 6. The data for the table is then range partitioned by date
according to month, at each partition on the shared-nothing architecture (i.e., at each
Search WWH ::




Custom Search