Indexing - Data Structures and Algorithm Analysis

Java Reference

In-Depth Information

could be sorted or organized using a tree structure, thereby imposing a logical or-

der on the records without physically rearranging them. One database might have

several associated index files, each supporting efficient access through a different

key field.

Each record of a database normally has a unique identifier, called the primary

key. For example, the primary key for a set of personnel records might be the

Social Security number or ID number for the individual. Unfortunately, the ID

number is generally an inconvenient value on which to perform a search because

the searcher is unlikely to know it. Instead, the searcher might know the desired

employee's name. Alternatively, the searcher might be interested in finding all

employees whose salary is in a certain range. If these are typical search requests

to the database, then the name and salary fields deserve separate indices. However,

key values in the name and salary indices are not likely to be unique.

A key field such as salary, where a particular key value might be duplicated in

multiple records, is called a secondary key. Most searches are performed using a

secondary key. The secondary key index (or more simply, secondary index) will

associate a secondary key value with the primary key of each record having that

secondary key value. At this point, the full database might be searched directly

for the record with that primary key, or there might be a primary key index (or

primary index) that relates each primary key value with a pointer to the actual

record on disk. In the latter case, only the primary index provides the location of

the actual record on disk, while the secondary indices refer to the primary index.

Indexing is an important technique for organizing large databases, and many

indexing methods have been developed. Direct access through hashing is discussed

in Section 9.4. A simple list sorted by key value can also serve as an index to the

record file. Indexing disk files by sorted lists are discussed in the following section.

Unfortunately, a sorted list does not perform well for insert and delete operations.

A third approach to indexing is the tree index. Trees are typically used to or-

ganize large databases that must support record insertion, deletion, and key range

searches. Section 10.2 briefly describes ISAM, a tentative step toward solving the

problem of storing a large database that must support insertion and deletion of

records. Its shortcomings help to illustrate the value of tree indexing techniques.

Section 10.3 introduces the basic issues related to tree indexing. Section 10.4 in-

troduces the 2-3 tree, a balanced tree structure that is a simple form of the B-tree

covered in Section 10.5. B-trees are the most widely used indexing method for

large disk-based databases, and for implementing file systems. Since they have

such great practical importance, many variations have been invented. Section 10.5

begins with a discussion of the variant normally referred to simply as a “B-tree.”

Section 10.5.1 presents the most widely implemented variant, the B + -tree.

Search WWH ::

Custom Search

Home