Oracle Processes - Expert Oracle Database Architecture

Database Reference

In-Depth Information

RECO: Distributed Database Recovery

RECO has a very focused job: it recovers transactions that are left in a prepared state because of a crash or loss of

connection during a two-phase commit ( 2PC ). A 2PC is a distributed protocol that allows for a modification that affects

many disparate databases to be committed atomically. It attempts to close the window for distributed failure as much as

possible before committing. In a 2PC between N databases, one of the databases—typically (but not always) the one the

client logged into initially—will be the coordinator. This one site will ask the other N-1 sites if they are ready to commit.

In effect, this one site will go to the N-1 sites and ask them to be prepared to commit. Each of the N-1 sites reports back

its prepared state as YES or NO. If any one of the sites votes NO, the entire transaction is rolled back. If all sites vote YES,

then the site coordinator broadcasts a message to make the commit permanent on each of the N-1 sites.

Say a site votes YES and is prepared to commit, but before it gets the directive from the coordinator to actually

commit, the network fails or some other error occurs, then the transaction becomes an in-doubt distributed

transaction . The 2PC tries to limit the window of time in which this can occur, but cannot remove it. If there is a failure

right then and there, the transaction will become the responsibility of RECO. RECO will try to contact the coordinator of

the transaction to discover its outcome. Until it does that, the transaction will remain in its uncommitted state. When

the transaction coordinator can be reached again, RECO will either commit the transaction or roll it back.

It should be noted that if the outage is to persist for an extended period of time, and you have some outstanding

transactions, you can commit/roll them back manually. You might want to do this since an in-doubt distributed

transaction can cause writers to block readers —this is the one time this can happen in Oracle. Your DBA could call the

DBA of the other database and ask her to query the status of those in-doubt transactions. Your DBA can then commit

or roll them back, relieving RECO of this task.

CKPT: Checkpoint Process

The checkpoint process doesn't, as its name implies, do a checkpoint (checkpoints were discussed in Chapter 3 in the

section on redo logs)—that's mostly the job of DBWn . It simply assists with the checkpointing process by updating

the file headers of the data files. It used to be that CKPT was an optional process, but starting with version 8.0 of the

database, it is always started, so if you do a ps on UNIX/Linux, you'll normally see it there (I say “normally” because as

of Oracle 12 c , it's possible for the checkpoint process to run within an operating system thread, and therefore won't be

displayed as a process).

The job of updating data files' headers with checkpoint information used to belong to the LGWR ; however, as the

number of files increased along with the size of a database over time, this additional task for LGWR became too much of

a burden. If LGWR had to update dozens, or hundreds, or even thousands, of files, there would be a good chance sessions

waiting to commit these transactions would have to wait far too long. CKPT removes this responsibility from LGWR .

DBWn: Database Block Writer

The database block writer ( DBWn ) is the background process responsible for writing dirty blocks to disk. DBWn will write

dirty blocks from the buffer cache, usually to make more room in the cache (to free buffers for reads of other data) or

to advance a checkpoint (to move forward the position in an online redo log file from which Oracle would have to start

reading, to recover the instance in the event of failure). As we discussed in Chapter 3 when Oracle switches log files,

a checkpoint is signaled. Oracle needs to advance the checkpoint so that it no longer needs the online redo log file it

just filled up. If it hasn't been able to do that by the time we need to reuse that redo log file, we get the “checkpoint not

complete” message and we must wait.

■ advancing log files is only one of many ways for checkpoint activity to occur. there are incremental

checkpoints controlled by parameters such as FAST_START_MTTR_TARGET and other triggers that cause dirty blocks

to be flushed to disk.

Note

Expert Oracle Database Architecture

Search WWH ::

Custom Search

Home