Basic Indexing Methods - Physical Database Design

Databases Reference

In-Depth Information

are found by taking the AND of the bit vectors, as shown in Figure 2.5 for the intersec-

tion of reg-name = 'southwest' and sales-id = 412. Bitmap indexes are now quite com-

mon in commercial systems, for example, IBM's DB2, Oracle, and SQL Server.

However, despite their efficiency in quickly identifying records, bitmap indexes are

not as easily extended to new keys as B+tree indexes. They also do not store data in

sorted order that makes B+trees so valuable for range predicates and implied sorting of

data. As a result they are best suited to databases where data is accessed by point values,

and new data values are infrequently added to the database.

2.4 Record Identifiers

Indexes always store a reference back to where the key values can be found in the base

table. These identifiers are called record identifiers, or RIDs. RIDs are needed because

the index itself often does not include all the column data required to answer a query

requiring the database to go back to the base table for the extra information. Thanks to

the index the database knows exactly where to look in the base table, so performance is

still dramatically improved compared to a table scan, but not quite as good as it would

be if the query could be answered with just the data residing within the index. A simple

example would be a query to a bank account. A table for a bank account table would

typically have an index on the account number. When a customer or teller wants to find

out information about the account the query is issued to retrieve the data for a specific

account number. The database scans the index to find the key that matches the account

number, but the additional information such as name, address, balance, etc is found

within the base table itself. The index scan returns the RID telling the database where to

find the full table record.

Of course, a composite index can be defined that includes most or all of the col-

umns in the base table which would obviate the need to go back to the base table for

information. However, creating wide indexes that include many columns is unreason-

ably expensive in terms of storage costs, and reduces the number of keys that can be

returned in a single I/O of the index leaf pages, so it is rarely done.

Indexes can return a single RID or lists of RIDs depending on how many keys

qualified for the search and how many RIDs were associated with each key value.

The classic format for RIDs (see C.J. Date 2003) used a 4 byte model. Data wthin

the tables is stored on storage pages (4KB or 8KB would be typical). Each page is num-

bered, 1,2,3 etc. Records within each page are similarly numbered. The 4 byte model

for RIDs used 3 bytes to represent a page number and one byte to represent a record

number. This structure limited table size to three byte addressability (xFFFFFF or

16777215 pages). Similarly the number of records that could be addressed on a data-

base page were limited to one byte (xFF or 256 records). The 4 byte RID design was

effective in clearly locating a record on a page. However, both limits became too con-

strictive over time and database vendors required designs that would not limit them to 3

Search WWH ::

Custom Search

Home