Indexing - Data Structures and Algorithm Analysis

Java Reference

In-Depth Information

Note that this insertion process is guaranteed to keep all nodes at least half full.

For example, when we attempt to insert into a full internal node of a B-tree of order

four, there will now be five children that must be dealt with. The node is split into

two nodes containing two keys each, thus retaining the B-tree property. The middle

of the five children is promoted to its parent.

B + -Trees

10.5.1

The previous section mentioned that B-trees are universally used to implement

large-scale disk-based systems. Actually, the B-tree as described in the previ-

ous section is almost never implemented, nor is the 2-3 tree as described in Sec-

tion 10.4. What is most commonly implemented is a variant of the B-tree, called

the B + -tree. When greater efficiency is required, a more complicated variant known

as the B -tree is used.

When data are static, a linear index provides an extremely efficient way to

search. The problem is how to handle those pesky inserts and deletes. We could try

to keep the core idea of storing a sorted array-based list, but make it more flexible

by breaking the list into manageable chunks that are more easily updated. How

might we do that? First, we need to decide how big the chunks should be. Since

the data are on disk, it seems reasonable to store a chunk that is the size of a disk

block, or a small multiple of the disk block size. If the next record to be inserted

belongs to a chunk that hasn't filled its block then we can just insert it there. The

fact that this might cause other records in that chunk to move a little bit in the array

is not important, since this does not cause any extra disk accesses so long as we

move data within that chunk. But what if the chunk fills up the entire block that

contains it? We could just split it in half. What if we want to delete a record? We

could just take the deleted record out of the chunk, but we might not want a lot of

near-empty chunks. So we could put adjacent chunks together if they have only

a small amount of data between them. Or we could shuffle data between adjacent

chunks that together contain more data. The big problem would be how to find

the desired chunk when processing a record with a given key. Perhaps some sort

of tree-like structure could be used to locate the appropriate chunk. These ideas

are exactly what motivate the B + -tree. The B + -tree is essentially a mechanism for

managing a sorted array-based list, where the list is broken into chunks.

The most significant difference between the B + -tree and the BST or the stan-

dard B-tree is that the B + -tree stores records only at the leaf nodes. Internal nodes

store key values, but these are used solely as placeholders to guide the search. This

means that internal nodes are significantly different in structure from leaf nodes.

Internal nodes store keys to guide the search, associating each key with a pointer

to a child B + -tree node. Leaf nodes store actual records, or else keys and pointers

to actual records in a separate disk file if the B + -tree is being used purely as an

index. Depending on the size of a record as compared to the size of a key, a leaf

Search WWH ::

Custom Search

Home