Java Reference
In-Depth Information
neighboring sibling, if possible. If the sibling is also full, then these two nodes split
into three. Similarly, when a node underflows, it is combined with its two siblings,
and the total reduced to two nodes. Thus, the nodes are always at least two thirds
full. 2
10.5.2
B-Tree Analysis
The asymptotic cost of search, insertion, and deletion of records from B-trees,
B + -trees, and B -trees is (log n) where n is the total number of records in the
tree. However, the base of the log is the (average) branching factor of the tree.
Typical database applications use extremely high branching factors, perhaps 100 or
more. Thus, in practice the B-tree and its variants are extremely shallow.
As an illustration, consider a B + -tree of order 100 and leaf nodes that contain
up to 100 records. A B + -tree with height one (that is, just a single leaf node) can
have at most 100 records. A B + -tree with height two (a root internal node whose
children are leaves) must have at least 100 records (2 leaves with 50 records each).
It has at most 10,000 records (100 leaves with 100 records each). A B + -tree with
height three must have at least 5000 records (two second-level nodes with 50 chil-
dren containing 50 records each) and at most one million records (100 second-level
nodes with 100 full children each). A B + -tree with height four must have at least
250,000 records and at most 100 million records. Thus, it would require an ex-
tremely large database to generate a B + -tree of more than height four.
The B + -tree split and insert rules guarantee that every node (except perhaps the
root) is at least half full. So they are on average about 3=4 full. But the internal
nodes are purely overhead, since the keys stored there are used only by the tree to
direct search, rather than store actual data. Does this overhead amount to a signifi-
cant use of space? No, because once again the high fan-out rate of the tree structure
means that the vast majority of nodes are leaf nodes. Recall (from Section 6.4) that
a full K-ary tree has approximately 1=K of its nodes as internal nodes. This means
that while half of a full binary tree's nodes are internal nodes, in a B + -tree of order
100 probably only about 1=75 of its nodes are internal nodes. This means that the
overhead associated with internal nodes is very low.
We can reduce the number of disk fetches required for the B-tree even more
by using the following methods. First, the upper levels of the tree can be stored in
main memory at all times. Because the tree branches so quickly, the top two levels
(levels 0 and 1) require relatively little space. If the B-tree is only height four, then
2 This concept can be extended further if higher space utilization is required. However, the update
routines become much more complicated. I once worked on a project where we implemented 3-for-4
node split and merge routines. This gave better performance than the 2-for-3 node split and merge
routines of the B -tree. However, the spitting and merging routines were so complicated that even
their author could no longer understand them once they were completed!
Search WWH ::




Custom Search