Online Index Construction and Maintenance - Transaction Processing

Information Technology Reference

In-Depth Information

11.4

Extraction of the Index Records

Algorithm 11.1 is used to scan, in primary-key order, the sparse B-tree primary

index of the relation, to extract the index records for the secondary index

being constructed, and to store the records onto disk. This is done using

repeated calls of find-page-for-bulk-read .p; p 0 ; Cursor ; 1 / (see Sect. 10.3 ),

where the shared variable Cursor holds the primary key of the next tuple to be

scanned.

We assume that the system-catalog page that stores index information for the

relation contains a variable that tells the state of the secondary index, that is, whether

the index is “under construction,” “available,” or “does not exist.” The updating of

this information is protected by a short-duration write latch on the page containing

the information. The updating of the shared variable Cursor is protected by a write

latch kept just for the instant of setting its value.

At the start of the scan, the state of the index in the system catalog is updated

from “does not exist” to “under construction.” For the short duration of this update,

the relation is S-locked so as to prevent an update action by a transaction seeing the

index state as “does not exist” and later another update action (or an undo action) by

the same transaction seeing the state as “under construction.” No locks other than

the short-duration S lock are acquired on the relation or on the tuples read during

the scan.

The initial value of Cursor is 1 , the lowest primary-key value. During the

traversal of the B-tree, pages are read-latched in the usual way, as explained earlier

in Chaps. 7 and 10 . The scan ends when Cursor reaches the highest primary-key

value 1 .

To address the problem of restartability, the relation is scanned in bulks of

tuples that fit in main memory, each such bulk is sorted and forced to disk, and

a checkpoint record is written to the log that contains the current value of Cursor

and the addresses of the sorted bulks on disk. In the event of a failure during the

scan, the scan is restarted at the Cursor value found in the most recent checkpoint

record found on the log disk.

A main-memory priority queue is used to create the sorted bulks of

index records. In Algorithm 11.1 , after performing the call find-page-for-bulk-

read .p; p 0 ; Cursor ; 1 /, the secondary-index records for the tuples in page p are

pushed into the priority queue, and the shared variable Cursor is advanced to

the primary key of the first tuple in the page p 0 next to page p. Whenever the

priority queue has no room for the index records for the tuples in the leaf page

p currently being scanned, the records currently in the queue are pulled out and

written to a new file, the file is flushed onto disk, and a checkpoint record is

logged.

Search WWH ::

Custom Search

Home