Tuning Recovery - Expert Oracle RAC Performance Diagnostics and Tuning

Database Reference

In-Depth Information

Chapter 10

Tuning Recovery

Every single system is prone to failure, be it natural, mechanical, or electronic; this could be the human system,

automobiles, computer hardware, elevators, application servers, applications, database servers, databases, and

network connectivity. Based on the critical nature of the item and its everyday use, these types of failures need an

alternative way to provide the required service and or a method to keep the systems up and functioning. For example,

human systems can fail due to sickness; and the sickness can be simple like a fever or complex like a heart attack.

The immediate need in this situation is to visit a doctor and get treated. Treatments would help control the situation

and get the body functioning again. An automobile can fail, which could be due to a simple failure like a flat tire.

A backup option in this case would be a spare tire and some essential tools used to replace the tire. In some unavoidable

conditions, an alternative method of transportation has to be used, for example, a bus or taxi. Electronic devices

such as computer hardware are also prone to failures; these hardware come in many forms to comprise the entire

enterprise configuration. Normally, protection against hardware failures is achieved by providing redundancy at all

tiers of the configuration. This helps because when one component fails, the other will help continue operation.

On the database side, the storage system that physically stores the data needs to be protected. An example is

mirroring the disk, where the data is copied to another disk to provide safety and failover when a disk in the array fails.

This will provide the required redundancy against disk failures.

What happens when a privileged user accidently deletes rows from a table in a production database? What

happens when this damage is only noticed a few days after the accident occurred? What happens when lightening hits

the production center and the electric grid, causing a short circuit that damages the entire storage subsystem? In all

these situations, an alternative method over and beyond the redundant hardware architecture is required to get to the

bottom of the problem for resolution, namely, a process to retrieve and recover the lost data.

The answer is that a copy of the data needs to be saved regularly to another media and stored in a remote

location. Such a method of data storage will protect the enterprise from losing its valuable data. The method of

copying data from a live system for storage in a remote location is called a backup process.

Backing up the database and related datafiles are just not sufficient; when issues arise, they should be able to

restore and recover the database with easy and quick measures. As database sizes grow larger and larger, simple

backup techniques or media to store them may not be sufficient to meet the SLA requirements of the business.

Recovery of a database should be efficient and optimized for performance to make the environment highly available.

After all, if recovery was never a concern and databases are always secure from data loss, why would we need to make

a backup of the data? So the end result is to ensure recovery of the database.

In a RAC environment, multiple instances provide access to data, giving availability to the environment. However,

servers or instances in a RAC environment are also prone to failures; and recovery of instances is critical to make

changes made by users available to other instances in the cluster.

Commonly, in a RAC environment, there are primarily two types of recovery scenarios: instance recovery and

media recovery. However, when all instances in a RAC environment crash while the underlying method to recover still

continues to be instance-level recovery, the terminology is crash recovery.

Search WWH ::

Custom Search

Home