Database Reference
In-Depth Information
RECO: Distributed Database Recovery
RECO has a very focused job: it recovers transactions that are left in a prepared state because of a crash or loss of
connection during a two-phase commit ( 2PC ). A 2PC is a distributed protocol that allows for a modification that affects
many disparate databases to be committed atomically. It attempts to close the window for distributed failure as much as
possible before committing. In a 2PC between N databases, one of the databases—typically (but not always) the one the
client logged into initially—will be the coordinator. This one site will ask the other N-1 sites if they are ready to commit.
In effect, this one site will go to the N-1 sites and ask them to be prepared to commit. Each of the N-1 sites reports back
its prepared state as YES or NO. If any one of the sites votes NO, the entire transaction is rolled back. If all sites vote YES,
then the site coordinator broadcasts a message to make the commit permanent on each of the N-1 sites.
Say a site votes YES and is prepared to commit, but before it gets the directive from the coordinator to actually
commit, the network fails or some other error occurs, then the transaction becomes an in-doubt distributed
transaction . The 2PC tries to limit the window of time in which this can occur, but cannot remove it. If there is a failure
right then and there, the transaction will become the responsibility of RECO. RECO will try to contact the coordinator of
the transaction to discover its outcome. Until it does that, the transaction will remain in its uncommitted state. When
the transaction coordinator can be reached again, RECO will either commit the transaction or roll it back.
It should be noted that if the outage is to persist for an extended period of time, and you have some outstanding
transactions, you can commit/roll them back manually. You might want to do this since an in-doubt distributed
transaction can cause writers to block readers —this is the one time this can happen in Oracle. Your DBA could call the
DBA of the other database and ask her to query the status of those in-doubt transactions. Your DBA can then commit
or roll them back, relieving RECO of this task.
CKPT: Checkpoint Process
The checkpoint process doesn't, as its name implies, do a checkpoint (checkpoints were discussed in Chapter 3 in the
section on redo logs)—that's mostly the job of DBWn . It simply assists with the checkpointing process by updating
the file headers of the data files. It used to be that CKPT was an optional process, but starting with version 8.0 of the
database, it is always started, so if you do a ps on UNIX/Linux, you'll normally see it there (I say “normally” because as
of Oracle 12 c , it's possible for the checkpoint process to run within an operating system thread, and therefore won't be
displayed as a process).
The job of updating data files' headers with checkpoint information used to belong to the LGWR ; however, as the
number of files increased along with the size of a database over time, this additional task for LGWR became too much of
a burden. If LGWR had to update dozens, or hundreds, or even thousands, of files, there would be a good chance sessions
waiting to commit these transactions would have to wait far too long. CKPT removes this responsibility from LGWR .
DBWn: Database Block Writer
The database block writer ( DBWn ) is the background process responsible for writing dirty blocks to disk. DBWn will write
dirty blocks from the buffer cache, usually to make more room in the cache (to free buffers for reads of other data) or
to advance a checkpoint (to move forward the position in an online redo log file from which Oracle would have to start
reading, to recover the instance in the event of failure). As we discussed in Chapter 3 when Oracle switches log files,
a checkpoint is signaled. Oracle needs to advance the checkpoint so that it no longer needs the online redo log file it
just filled up. If it hasn't been able to do that by the time we need to reuse that redo log file, we get the “checkpoint not
complete” message and we must wait.
advancing log files is only one of many ways for checkpoint activity to occur. there are incremental
checkpoints controlled by parameters such as FAST_START_MTTR_TARGET and other triggers that cause dirty blocks
to be flushed to disk.
Note
 
 
Search WWH ::




Custom Search