Java Reference
In-Depth Information
among the cylinders, sorting the records within each cylinder, and updating both
the system index table and the within-cylinder block table. Such reorganization
was typical of database systems during the 1960s and would normally be done
each night or weekly.
10.3
Tree-based Indexing
Linear indexing is efficient when the database is static, that is, when records are
inserted and deleted rarely or never. ISAM is adequate for a limited number of
updates, but not for frequent changes. Because it has essentially two levels of
indexing, ISAM will also break down for a truly large database where the number
of cylinders is too great for the top-level index to fit in main memory.
In their most general form, database applications have the following character-
istics:
1. Large sets of records that are frequently updated.
2. Search is by one or a combination of several keys.
3. Key range queries or min/max queries are used.
For such databases, a better organization must be found. One approach would
be to use the binary search tree (BST) to store primary and secondary key indices.
BSTs can store duplicate key values, they provide efficient insertion and deletion as
well as efficient search, and they can perform efficient range queries. When there
is enough main memory, the BST is a viable option for implementing both primary
and secondary key indices.
Unfortunately, the BST can become unbalanced. Even under relatively good
conditions, the depth of leaf nodes can easily vary by a factor of two. This might
not be a significant concern when the tree is stored in main memory because the
time required is still (log n) for search and update. When the tree is stored on
disk, however, the depth of nodes in the tree becomes crucial. Every time a BST
node B is visited, it is necessary to visit all nodes along the path from the root to B.
Each node on this path must be retrieved from disk. Each disk access returns a
block of information. If a node is on the same block as its parent, then the cost to
find that node is trivial once its parent is in main memory. Thus, it is desirable to
keep subtrees together on the same block. Unfortunately, many times a node is not
on the same block as its parent. Thus, each access to a BST node could potentially
require that another block to be read from disk. Using a buffer pool to store multiple
blocks in memory can mitigate disk access problems if BST accesses display good
locality of reference. But a buffer pool cannot eliminate disk I/O entirely. The
problem becomes greater if the BST is unbalanced, because nodes deep in the tree
have the potential of causing many disk blocks to be read. Thus, there are two
significant issues that must be addressed to have efficient search from a disk-based
Search WWH ::




Custom Search