Basic Indexing Methods - Physical Database Design

Databases Reference

In-Depth Information

method for a particular database was a complex decision, requiring considerable formal

training. For other types of databases—object-oriented, spatial, temporal, and so on—

the list of potential indexing strategies (R-trees for instance) goes on and will continue

to evolve for the foreseeable future.

Fortunately, for relational databases, the B+tree has properties that span virtually all

of the methods mentioned above and has become the de facto indexing method for the

major relational database systems today. This chapter discusses the basic principles of

indexing used in today's database systems to enhance performance.

2.1 B+tree Index

The B+tree is the primary indexing method supported by DB2, Oracle, and SQL Server.

It features not only fast access, but also the dynamic maintenance that virtually eliminates

the overflow problems that occur in the older hashing and indexed sequential methods.

Let's take a look at a typical B+tree configuration in Figure 2.1. Each nonleaf index node

consists of p tree pointers and p -1 key values. The key values denote where to search to

find rows that have either smaller key values, by taking the tree pointer to the left of the

key, or great or equal key values by taking the tree pointer to the right of the key. Each leaf

index node consists of a series of key and data-pointer combinations that point to each

row. The leaf index nodes (and the associated data blocks) are connected logically by block

pointers so that an ordered sequence of rows can be found quickly. The variable p repre-

sents the order of the B+tree, the fan-out of pointers from one node to the next lower node

in the tree.

The height of a B+tree is the level of the leaf nodes, given that the root node is

defined as Level 1. In Figure 2.1, the order of the B+tree is four and the height is three.

The intermmediate nodes help the database find the desired leaf nodes with very

few I/Os. However, what's stored within the leaf nodes is the real meat. The leaf nodes

will store three very important things: the key, a record identifier (RID) and a pointer to

the next leaf. For example, if the index is defined on a CITY column, the keys of the

index may include NEW YORK, BOSTON, TORONTO, SAN JOSE. For each city

the index will include an identifier of where to find each record in the table that

matches the key. Consider a search through the index where the key value is 'NEW

YORK'. If the table has 400 entries for NEW YORK, the index leaf node will include

one key entry for NEW YORK and identifiers for where to find each of the 400 records

in the base table. These identifier are called record identifiers, or RIDs, and will be dis-

cussed next. The key and the RIDs are the two most critical parts of the leaf content.

Keys are stored within the leaf pages in sorted order. As a result while B+tree

indexes can be used by the DBMS to find a row or set or rows that match a single key,

they can also be used to easily find a range of keys (i.e., a numeric range or alphabetic

range) as well as implicitly return records in sorted order.

Search WWH ::

Custom Search

Home