Indexing - Data Structures and Algorithm Analysis

Java Reference

In-Depth Information

among the cylinders, sorting the records within each cylinder, and updating both

the system index table and the within-cylinder block table. Such reorganization

was typical of database systems during the 1960s and would normally be done

each night or weekly.

10.3

Tree-based Indexing

Linear indexing is efficient when the database is static, that is, when records are

inserted and deleted rarely or never. ISAM is adequate for a limited number of

updates, but not for frequent changes. Because it has essentially two levels of

indexing, ISAM will also break down for a truly large database where the number

of cylinders is too great for the top-level index to fit in main memory.

In their most general form, database applications have the following character-

istics:

1. Large sets of records that are frequently updated.

2. Search is by one or a combination of several keys.

3. Key range queries or min/max queries are used.

For such databases, a better organization must be found. One approach would

be to use the binary search tree (BST) to store primary and secondary key indices.

BSTs can store duplicate key values, they provide efficient insertion and deletion as

well as efficient search, and they can perform efficient range queries. When there

is enough main memory, the BST is a viable option for implementing both primary

and secondary key indices.

Unfortunately, the BST can become unbalanced. Even under relatively good

conditions, the depth of leaf nodes can easily vary by a factor of two. This might

not be a significant concern when the tree is stored in main memory because the

time required is still (log n) for search and update. When the tree is stored on

disk, however, the depth of nodes in the tree becomes crucial. Every time a BST

node B is visited, it is necessary to visit all nodes along the path from the root to B.

Each node on this path must be retrieved from disk. Each disk access returns a

block of information. If a node is on the same block as its parent, then the cost to

find that node is trivial once its parent is in main memory. Thus, it is desirable to

keep subtrees together on the same block. Unfortunately, many times a node is not

on the same block as its parent. Thus, each access to a BST node could potentially

require that another block to be read from disk. Using a buffer pool to store multiple

blocks in memory can mitigate disk access problems if BST accesses display good

locality of reference. But a buffer pool cannot eliminate disk I/O entirely. The

problem becomes greater if the BST is unbalanced, because nodes deep in the tree

have the potential of causing many disk blocks to be read. Thus, there are two

significant issues that must be addressed to have efficient search from a disk-based

Search WWH ::

Custom Search

Home