Database Reference
In-Depth Information
Instance Recovery
Instance recovery is to recover the database when an instance crashes midstream during user activity. Unlike in a
traditional single instance database scenario, recovery of an instance in a RAC environment is dynamic and happens
while the database is up and active. It is probably the most important aspect of recovery that applies to RAC. The idea
of having multiple nodes in a clustered configuration is to provide availability with the assumption that if one or more
instances in the cluster where to fail, the remaining instance would provide business continuum. For this reason,
instance recovery becomes more critical.
One of the primary requirements of a RAC configuration is to have the redo logs of all instances participating in
the cluster on the shared storage. The primary reason for such a requirement is to provide visibility of the redo logs
of any instance in the cluster to all other instances. This allows for any instance in the cluster to perform an instance
recovery operation during an instance failure.
Instance failure could happen in several ways; the common reason for an instance failure is when the node
itself fails. The node failure could be due to several reasons including power surge, operator error, and so forth.
Other reasons for an instance failure could be because a certain background process fails or dies or when there
is a kernel-level exception encountered by the instance, causing an ORA-0600 or ORA-07445 error. Issuing a
SHUTDOWN ABORT command could also cause an instance failure.
Instance failures could be of different kinds:
The instance is totally down and the users do not have any access to the instance.
The instance is up; however, when connecting to it, there is a hang situation or the user gets
no response.
In the case in which an instance is not available, users could continue accessing the database via one of the other
surviving instances in an active-active configuration provided the failover option has been enabled in the application.
Recovery from an instance failure happens from another instance that is up and running that is part of the
cluster configuration and whose heartbeat mechanism detected the failure first and informed the LMON process on
the node. The LMON process on each cluster node communicates with the CM on the respective node and exposes that
information to the respective instances.
LMON provides the monitoring function by continually sending messages from the node on which it runs and often
by writing to the shared disk. When the node fails to perform these functions, the other nodes consider that node as no
longer a member of the cluster. Such a failure causes a change in a node's membership status within the cluster.
The LMON process controls the recovery of the failed instance by taking over its redo log files and performing
instance recovery.
How Does Oracle Know That Recovery Is Required for a Given Data File?
The system change number (SCN) is a logical clock inside the database kernel that increments with each and every
change made to the database. The SCN describes a “version” or a committed version of the database. When a
database performs a checkpoint operation, an SCN (called the checkpoint SCN) is written to the data file headers.
This is called the start SCN. There is also an SCN value in the control file for every data file, which is called the stop
SCN. There is another data structure called the checkpoint counter in each data file header and also in the control file
for each data file entry. The checkpoint counter increments every time a checkpoint happens on a data file and the
start SCN value is updated. When a data file is in hot backup mode, the checkpoint information in the file header is
frozen; but the checkpoint counter still gets updated.
When the database is shut down gracefully, with the SHUTDOWN NORMAL or SHUTDOWN IMMEDIATE command,
Oracle performs a checkpoint and copies the start SCN value of each data file to its corresponding stop SCN value in
the control file before the actual shutdown of the database.
 
Search WWH ::




Custom Search