Database Reference
In-Depth Information
Chapter 10
Tuning Recovery
Every single system is prone to failure, be it natural, mechanical, or electronic; this could be the human system,
automobiles, computer hardware, elevators, application servers, applications, database servers, databases, and
network connectivity. Based on the critical nature of the item and its everyday use, these types of failures need an
alternative way to provide the required service and or a method to keep the systems up and functioning. For example,
human systems can fail due to sickness; and the sickness can be simple like a fever or complex like a heart attack.
The immediate need in this situation is to visit a doctor and get treated. Treatments would help control the situation
and get the body functioning again. An automobile can fail, which could be due to a simple failure like a flat tire.
A backup option in this case would be a spare tire and some essential tools used to replace the tire. In some unavoidable
conditions, an alternative method of transportation has to be used, for example, a bus or taxi. Electronic devices
such as computer hardware are also prone to failures; these hardware come in many forms to comprise the entire
enterprise configuration. Normally, protection against hardware failures is achieved by providing redundancy at all
tiers of the configuration. This helps because when one component fails, the other will help continue operation.
On the database side, the storage system that physically stores the data needs to be protected. An example is
mirroring the disk, where the data is copied to another disk to provide safety and failover when a disk in the array fails.
This will provide the required redundancy against disk failures.
What happens when a privileged user accidently deletes rows from a table in a production database? What
happens when this damage is only noticed a few days after the accident occurred? What happens when lightening hits
the production center and the electric grid, causing a short circuit that damages the entire storage subsystem? In all
these situations, an alternative method over and beyond the redundant hardware architecture is required to get to the
bottom of the problem for resolution, namely, a process to retrieve and recover the lost data.
The answer is that a copy of the data needs to be saved regularly to another media and stored in a remote
location. Such a method of data storage will protect the enterprise from losing its valuable data. The method of
copying data from a live system for storage in a remote location is called a backup process.
Backing up the database and related datafiles are just not sufficient; when issues arise, they should be able to
restore and recover the database with easy and quick measures. As database sizes grow larger and larger, simple
backup techniques or media to store them may not be sufficient to meet the SLA requirements of the business.
Recovery of a database should be efficient and optimized for performance to make the environment highly available.
After all, if recovery was never a concern and databases are always secure from data loss, why would we need to make
a backup of the data? So the end result is to ensure recovery of the database.
In a RAC environment, multiple instances provide access to data, giving availability to the environment. However,
servers or instances in a RAC environment are also prone to failures; and recovery of instances is critical to make
changes made by users available to other instances in the cluster.
Commonly, in a RAC environment, there are primarily two types of recovery scenarios: instance recovery and
media recovery. However, when all instances in a RAC environment crash while the underlying method to recover still
continues to be instance-level recovery, the terminology is crash recovery.
 
Search WWH ::




Custom Search