Databases Reference
In-Depth Information
range. Once again we see this situation become murkier when global indexes are intro-
duced, since keys in the global index can't easily be dropped in a single operation. Some
databases solve the handling of global indexes for roll-out by removing the keys for the
deleted ranges asynchronously (i.e., perhaps minutes or hours after the range has other-
wise been deleted). Metadata is stored regarding the index to ensure that index-only
access to the deleted ranges will not access the keys pending deletion.
7.5 Increased Addressability
There are two major reasons why range partitioning is valuable as an addressability aid.
Using a classic 4-byte RID 2 , with row addressability defined by page number and record
(or slot) number, RIDs can only address into a 3-byte space for pages (1 byte of the 4
being needed for the record or slot number). This restricts the addressability of pages to
0
×
00FFFFFF, or roughly 16.5 million pages per table. Again, depending on the archi-
tecture this limit may apply to a single table or possibly a higher-level abstraction such
as a storage group. Regardless of whether the page size is 4 KB or 32 KB the restriction
can pose serious capacity limitations for databases measured in tens of terabytes (TB).
Using page sizes larger than 32 KB is not practical for most applications, and few if any
commercial database products support it. Range partitioning helps reduce the capacity
problem by splitting a table into multiple tables (internally, though not exposed to the
application) so that each range can be independently addressed. For example, a table
with 10 balanced ranges will have 10 times the capacity (addressability) of a single
unpartitioned table.
Another strategy for resolving the storage capacity constraint is to support larger
RIDs, possibly even variable-length RIDs. For example, an 8-byte RID with 7 bytes
used for page addressing could support over 72 zillion (7
10 16 ) pages! However, while
larger RIDs resolve the addressability constraint, they incur a storage constraint of their
own. RIDs are stored on disk as part of index structures to provide pointers from the
index keys back into the data pages. Using a wider format for RIDs will significantly
increase the storage required for indexes, and similarly increase the memory required for
index operations. Similarly, RID-based operations that occur through index ANDing
and index ORing become more computationally complex, largely a linear increase that
is proportional to the width of the RID structure. Various techniques have been devised
to try and counter the overhead of wider RIDs, including variable-length RIDs. How-
ever, suffice it to say that short RIDs provide computation and storage efficiency. Range
×
2 4-byte RID is the classic RID structure described in C.J. Date, An Introduction to Database Sys-
tems , Vol. 1, 8th Ed., Addison-Wesley, 2003. Many other formats for RIDs are commonly in use,
with several having larger size and addressability.
Search WWH ::




Custom Search