Databases Reference
In-Depth Information
method for a particular database was a complex decision, requiring considerable formal
training. For other types of databases—object-oriented, spatial, temporal, and so on—
the list of potential indexing strategies (R-trees for instance) goes on and will continue
to evolve for the foreseeable future.
Fortunately, for relational databases, the B+tree has properties that span virtually all
of the methods mentioned above and has become the de facto indexing method for the
major relational database systems today. This chapter discusses the basic principles of
indexing used in today's database systems to enhance performance.
2.1 B+tree Index
The B+tree is the primary indexing method supported by DB2, Oracle, and SQL Server.
It features not only fast access, but also the dynamic maintenance that virtually eliminates
the overflow problems that occur in the older hashing and indexed sequential methods.
Let's take a look at a typical B+tree configuration in Figure 2.1. Each nonleaf index node
consists of p tree pointers and p -1 key values. The key values denote where to search to
find rows that have either smaller key values, by taking the tree pointer to the left of the
key, or great or equal key values by taking the tree pointer to the right of the key. Each leaf
index node consists of a series of key and data-pointer combinations that point to each
row. The leaf index nodes (and the associated data blocks) are connected logically by block
pointers so that an ordered sequence of rows can be found quickly. The variable p repre-
sents the order of the B+tree, the fan-out of pointers from one node to the next lower node
in the tree.
The height of a B+tree is the level of the leaf nodes, given that the root node is
defined as Level 1. In Figure 2.1, the order of the B+tree is four and the height is three.
The intermmediate nodes help the database find the desired leaf nodes with very
few I/Os. However, what's stored within the leaf nodes is the real meat. The leaf nodes
will store three very important things: the key, a record identifier (RID) and a pointer to
the next leaf. For example, if the index is defined on a CITY column, the keys of the
index may include NEW YORK, BOSTON, TORONTO, SAN JOSE. For each city
the index will include an identifier of where to find each record in the table that
matches the key. Consider a search through the index where the key value is 'NEW
YORK'. If the table has 400 entries for NEW YORK, the index leaf node will include
one key entry for NEW YORK and identifiers for where to find each of the 400 records
in the base table. These identifier are called record identifiers, or RIDs, and will be dis-
cussed next. The key and the RIDs are the two most critical parts of the leaf content.
Keys are stored within the leaf pages in sorted order. As a result while B+tree
indexes can be used by the DBMS to find a row or set or rows that match a single key,
they can also be used to easily find a range of keys (i.e., a numeric range or alphabetic
range) as well as implicitly return records in sorted order.
Search WWH ::




Custom Search