Range Partitioning - Physical Database Design

Databases Reference

In-Depth Information

range. Once again we see this situation become murkier when global indexes are intro-

duced, since keys in the global index can't easily be dropped in a single operation. Some

databases solve the handling of global indexes for roll-out by removing the keys for the

deleted ranges asynchronously (i.e., perhaps minutes or hours after the range has other-

wise been deleted). Metadata is stored regarding the index to ensure that index-only

access to the deleted ranges will not access the keys pending deletion.

7.5 Increased Addressability

There are two major reasons why range partitioning is valuable as an addressability aid.

Using a classic 4-byte RID 2 , with row addressability defined by page number and record

(or slot) number, RIDs can only address into a 3-byte space for pages (1 byte of the 4

being needed for the record or slot number). This restricts the addressability of pages to

0

×

00FFFFFF, or roughly 16.5 million pages per table. Again, depending on the archi-

tecture this limit may apply to a single table or possibly a higher-level abstraction such

as a storage group. Regardless of whether the page size is 4 KB or 32 KB the restriction

can pose serious capacity limitations for databases measured in tens of terabytes (TB).

Using page sizes larger than 32 KB is not practical for most applications, and few if any

commercial database products support it. Range partitioning helps reduce the capacity

problem by splitting a table into multiple tables (internally, though not exposed to the

application) so that each range can be independently addressed. For example, a table

with 10 balanced ranges will have 10 times the capacity (addressability) of a single

unpartitioned table.

Another strategy for resolving the storage capacity constraint is to support larger

RIDs, possibly even variable-length RIDs. For example, an 8-byte RID with 7 bytes

used for page addressing could support over 72 zillion (7

10 16 ) pages! However, while

larger RIDs resolve the addressability constraint, they incur a storage constraint of their

own. RIDs are stored on disk as part of index structures to provide pointers from the

index keys back into the data pages. Using a wider format for RIDs will significantly

increase the storage required for indexes, and similarly increase the memory required for

index operations. Similarly, RID-based operations that occur through index ANDing

and index ORing become more computationally complex, largely a linear increase that

is proportional to the width of the RID structure. Various techniques have been devised

to try and counter the overhead of wider RIDs, including variable-length RIDs. How-

ever, suffice it to say that short RIDs provide computation and storage efficiency. Range

×

2 4-byte RID is the classic RID structure described in C.J. Date, An Introduction to Database Sys-

tems , Vol. 1, 8th Ed., Addison-Wesley, 2003. Many other formats for RIDs are commonly in use,

with several having larger size and addressability.

Search WWH ::

Custom Search

Home