Indexing - Data Structures and Algorithm Analysis

Java Reference

In-Depth Information

LINEAR INDEX

37

42

52

73

98

73

52

98

37

42

DATABASE RECORDS

Figure10.1 Linear indexing for variable-length records. Each record in the

index file is of fixed length and contains a pointer to the beginning of the corre-

sponding record in the database file.

10.1

Linear Indexing

A linear index is an index file organized as a sequence of key/pointer pairs where

the keys are in sorted order and the pointers either (1) point to the position of the

complete record on disk, (2) point to the position of the primary key in the primary

index, or (3) are actually the value of the primary key. Depending on its size, a

linear index might be stored in main memory or on disk. A linear index provides

a number of advantages. It provides convenient access to variable-length database

records, because each entry in the index file contains a fixed-length key field and

a fixed-length pointer to the beginning of a (variable-length) record as shown in

Figure 10.1. A linear index also allows for efficient search and random access to

database records, because it is amenable to binary search.

If the database contains enough records, the linear index might be too large

to store in main memory. This makes binary search of the index more expensive

because many disk accesses would typically be required by the search process. One

solution to this problem is to store a second-level linear index in main memory that

indicates which disk block in the index file stores a desired key. For example, the

linear index on disk might reside in a series of 1024-byte blocks. If each key/pointer

pair in the linear index requires 8 bytes (a 4-byte key and a 4-byte pointer), then

128 key/pointer pairs are stored per block. The second-level index, stored in main

memory, consists of a simple table storing the value of the key in the first position

of each block in the linear index file. This arrangement is shown in Figure 10.2. If

the linear index requires 1024 disk blocks (1MB), the second-level index contains

only 1024 entries, one per disk block. To find which disk block contains a desired

search key value, first search through the 1024-entry table to find the greatest value

less than or equal to the search key. This directs the search to the proper block in

the index file, which is then read into memory. At this point, a binary search within

this block will produce a pointer to the actual record in the database. Because the

Data Structures and Algorithm Analysis

Search WWH ::

Custom Search

Home