Databases Reference
In-Depth Information
are found by taking the AND of the bit vectors, as shown in Figure 2.5 for the intersec-
tion of reg-name = 'southwest' and sales-id = 412. Bitmap indexes are now quite com-
mon in commercial systems, for example, IBM's DB2, Oracle, and SQL Server.
However, despite their efficiency in quickly identifying records, bitmap indexes are
not as easily extended to new keys as B+tree indexes. They also do not store data in
sorted order that makes B+trees so valuable for range predicates and implied sorting of
data. As a result they are best suited to databases where data is accessed by point values,
and new data values are infrequently added to the database.
2.4 Record Identifiers
Indexes always store a reference back to where the key values can be found in the base
table. These identifiers are called record identifiers, or RIDs. RIDs are needed because
the index itself often does not include all the column data required to answer a query
requiring the database to go back to the base table for the extra information. Thanks to
the index the database knows exactly where to look in the base table, so performance is
still dramatically improved compared to a table scan, but not quite as good as it would
be if the query could be answered with just the data residing within the index. A simple
example would be a query to a bank account. A table for a bank account table would
typically have an index on the account number. When a customer or teller wants to find
out information about the account the query is issued to retrieve the data for a specific
account number. The database scans the index to find the key that matches the account
number, but the additional information such as name, address, balance, etc is found
within the base table itself. The index scan returns the RID telling the database where to
find the full table record.
Of course, a composite index can be defined that includes most or all of the col-
umns in the base table which would obviate the need to go back to the base table for
information. However, creating wide indexes that include many columns is unreason-
ably expensive in terms of storage costs, and reduces the number of keys that can be
returned in a single I/O of the index leaf pages, so it is rarely done.
Indexes can return a single RID or lists of RIDs depending on how many keys
qualified for the search and how many RIDs were associated with each key value.
The classic format for RIDs (see C.J. Date 2003) used a 4 byte model. Data wthin
the tables is stored on storage pages (4KB or 8KB would be typical). Each page is num-
bered, 1,2,3 etc. Records within each page are similarly numbered. The 4 byte model
for RIDs used 3 bytes to represent a page number and one byte to represent a record
number. This structure limited table size to three byte addressability (xFFFFFF or
16777215 pages). Similarly the number of records that could be addressed on a data-
base page were limited to one byte (xFF or 256 records). The 4 byte RID design was
effective in clearly locating a record on a page. However, both limits became too con-
strictive over time and database vendors required designs that would not limit them to 3
Search WWH ::




Custom Search