Information Technology Reference
In-Depth Information
11.4
Extraction of the Index Records
Algorithm 11.1 is used to scan, in primary-key order, the sparse B-tree primary
index of the relation, to extract the index records for the secondary index
being constructed, and to store the records onto disk. This is done using
repeated calls of find-page-for-bulk-read .p; p 0 ; Cursor ; 1 / (see Sect. 10.3 ),
where the shared variable Cursor holds the primary key of the next tuple to be
scanned.
We assume that the system-catalog page that stores index information for the
relation contains a variable that tells the state of the secondary index, that is, whether
the index is “under construction,” “available,” or “does not exist.” The updating of
this information is protected by a short-duration write latch on the page containing
the information. The updating of the shared variable Cursor is protected by a write
latch kept just for the instant of setting its value.
At the start of the scan, the state of the index in the system catalog is updated
from “does not exist” to “under construction.” For the short duration of this update,
the relation is S-locked so as to prevent an update action by a transaction seeing the
index state as “does not exist” and later another update action (or an undo action) by
the same transaction seeing the state as “under construction.” No locks other than
the short-duration S lock are acquired on the relation or on the tuples read during
the scan.
The initial value of Cursor is 1 , the lowest primary-key value. During the
traversal of the B-tree, pages are read-latched in the usual way, as explained earlier
in Chaps. 7 and 10 . The scan ends when Cursor reaches the highest primary-key
value 1 .
To address the problem of restartability, the relation is scanned in bulks of
tuples that fit in main memory, each such bulk is sorted and forced to disk, and
a checkpoint record is written to the log that contains the current value of Cursor
and the addresses of the sorted bulks on disk. In the event of a failure during the
scan, the scan is restarted at the Cursor value found in the most recent checkpoint
record found on the log disk.
A main-memory priority queue is used to create the sorted bulks of
index records. In Algorithm 11.1 , after performing the call find-page-for-bulk-
read .p; p 0 ; Cursor ; 1 /, the secondary-index records for the tuples in page p are
pushed into the priority queue, and the shared variable Cursor is advanced to
the primary key of the first tuple in the page p 0 next to page p. Whenever the
priority queue has no room for the index records for the tuples in the leaf page
p currently being scanned, the records currently in the queue are pulled out and
written to a new file, the file is flushed onto disk, and a checkpoint record is
logged.
Search WWH ::




Custom Search