B-Tree Structure Modifications - Transaction Processing

Information Technology Reference

In-Depth Information

For simplicity, we assume that the tuples to be loaded into relation r.X;V/reside

in a source file f that has been sorted in ascending key order. The load begins

with an initially empty relation whose B-tree thus consists of a single, empty page.

Simultaneous accesses by user transactions are prevented by retaining the relation

r in the system catalog as invisible to transactions (or by X-locking the relation by

the multi-granular locking protocol to be defined in Chap. 9 ). The relation is made

visible only after the load has been completed. The load process scans the file f in

chunks of records, maintaining a cursor z that stores the key of the last tuple inserted

into the relation. The load begins with z D1 .

Since no processes other than the load process can update the relation, and since

the tuples are inserted in ascending key order, the insertions always go to the last leaf

page, which is split after being filled. To guarantee the balance of the B-tree during

the load, we must assume that the fill-factor used is somewhat less than 100%, so

that when a full leaf page is split at least two tuples go to the new sibling page. This

is an easy modification to the page-split algorithm (Algorithm 8.2 ).

We wish to save in logging, so that instead of logging all the tuples inserted into a

leaf page, we log only the key range (i.e., the least and greatest keys) of the inserted

tuples. More specifically, when the records within key range Œx; y in file f are

inserted into leaf page p of the B-tree, the insertion is logged with the redo-only log

record

h populate-leaf-page ;p;f;x;a x ;y i ;

(8.6)

where p and f are the identifiers of the page and the file, respectively, and a x is the

address in file f of the record that contains the tuple with key x.The LSN of the log

record is stamped in the P AGE -LSN field of page p. If in the redo pass of recovery

we have to redo the insertion of those tuples, we retrieve from the file f the set of

tuples in the logged key range Œx; y and insert the tuples into the page p.

Naturally, the use of a light-weight log record such as the above implies that we

have to keep available and unchanged the additional external information (the file

f ) necessary for its interpretation, as long as the effect of the operation is not yet

reflected in the disk version of the page. As such dependence of database durability

on external information may be considered undesirable or harmful, it is wise to

arrange that the leaf pages created by the load process are flushed onto disk as soon

as possible. If taking a complete checkpoint is out of the question, we may at least

push a leaf page to the front of the LRU chain as soon as it has been created, so that

it becomes the first page to be evicted from the buffer.

To cope with failures, the load process is made restartable as follows. When the

load process starts, it first traverses to the last leaf page so as to find out the greatest

key z already inserted. Then it writes the log record

h start-load ;r; z ;f i ;

(8.7)

where r and f are the identifiers of the relation and the file, respectively. Then the

load process begins inserting tuples .x; v / with keys x> z , repeatedly filling the last

Transaction Processing

Search WWH ::

Custom Search

Home