Information Technology Reference
In-Depth Information
For simplicity, we assume that the tuples to be loaded into relation r.X;V/reside
in a source file f that has been sorted in ascending key order. The load begins
with an initially empty relation whose B-tree thus consists of a single, empty page.
Simultaneous accesses by user transactions are prevented by retaining the relation
r in the system catalog as invisible to transactions (or by X-locking the relation by
the multi-granular locking protocol to be defined in Chap. 9 ). The relation is made
visible only after the load has been completed. The load process scans the file f in
chunks of records, maintaining a cursor z that stores the key of the last tuple inserted
into the relation. The load begins with z D1 .
Since no processes other than the load process can update the relation, and since
the tuples are inserted in ascending key order, the insertions always go to the last leaf
page, which is split after being filled. To guarantee the balance of the B-tree during
the load, we must assume that the fill-factor used is somewhat less than 100%, so
that when a full leaf page is split at least two tuples go to the new sibling page. This
is an easy modification to the page-split algorithm (Algorithm 8.2 ).
We wish to save in logging, so that instead of logging all the tuples inserted into a
leaf page, we log only the key range (i.e., the least and greatest keys) of the inserted
tuples. More specifically, when the records within key range Œx; y in file f are
inserted into leaf page p of the B-tree, the insertion is logged with the redo-only log
record
h populate-leaf-page ;p;f;x;a x ;y i ;
(8.6)
where p and f are the identifiers of the page and the file, respectively, and a x is the
address in file f of the record that contains the tuple with key x.The LSN of the log
record is stamped in the P AGE -LSN field of page p. If in the redo pass of recovery
we have to redo the insertion of those tuples, we retrieve from the file f the set of
tuples in the logged key range Œx; y and insert the tuples into the page p.
Naturally, the use of a light-weight log record such as the above implies that we
have to keep available and unchanged the additional external information (the file
f ) necessary for its interpretation, as long as the effect of the operation is not yet
reflected in the disk version of the page. As such dependence of database durability
on external information may be considered undesirable or harmful, it is wise to
arrange that the leaf pages created by the load process are flushed onto disk as soon
as possible. If taking a complete checkpoint is out of the question, we may at least
push a leaf page to the front of the LRU chain as soon as it has been created, so that
it becomes the first page to be evicted from the buffer.
To cope with failures, the load process is made restartable as follows. When the
load process starts, it first traverses to the last leaf page so as to find out the greatest
key z already inserted. Then it writes the log record
h start-load ;r; z ;f i ;
(8.7)
where r and f are the identifiers of the relation and the file, respectively. Then the
load process begins inserting tuples .x; v / with keys x> z , repeatedly filling the last
Search WWH ::




Custom Search