Overview of Oracle RAC - Expert Oracle RAC 12c

Database Reference

In-Depth Information

What Is High Availability?

As shown in the previous example of the online store application, business urges IT departments to provide solutions

to meet the availability requirements of business applications. As the centerpiece of most business applications,

database availability is the key to keeping all the applications available.

In most IT organizations, Service Level Agreements (SLAs) are used to define the application availability

agreement between business and IT organization. They can be defined as the percentage availability, or the maximum

downtime allowed per month or per year. For example, an SLA that specifies 99.999% availability means less than

5.26 minutes downtime allowed annually. Sometimes an SLA also specifies the particular time window allowed for

downtime; for example, a back-end office application database can be down between midnight and 4 a.m. the first

Saturday of each quarter for scheduled maintenance such as hardware and software upgrades.

Since most high availability solutions require additional hardware and/or software, the cost of these solutions

can be high. Companies should determine their HA requirements based on the nature of the applications and the

cost structure. For example some back-end office applications such as a human resource application may not need to

be online 24x7. For those mission-critical business applications that need to be highly available, an evaluation of the

cost of downtime may be calculated too; for example, how much money can be lost due to 1 hour of downtime. Then

we can compare the downtime costs with the capital costs and operational expenses associated with the design and

implementation of various levels of availability solution. This kind of comparison will help business managers and IT

departments come up with realistic SLAs that meet their real business and affordability needs and that their IT team

can deliver.

Many business applications consist of multi-tier applications that run on multiple computers in a distributed

network. The availability of the business applications depends not only on the infrastructure that supports these

multi-tier applications, including the server hardware, storage, network, and OS, but also on each tier of the

applications, such as web servers, application servers, and database servers. In this chapter, I will focus mainly on the

availability of the database server, which is the database administrator's responsibility.

Database availability also plays a critical role in application availability. We use downtime to refer to the periods

when a database is unavailable. The downtime can be either unplanned downtime or planned downtime. Unplanned

downtime can occur without being prepared by system admin or DBAs—it may be caused by an unexpected event

such as hardware or software failure, human error, or even a natural disaster (losing a data center). Most unplanned

downtime can be anticipated; for example, when designing a cluster it is best to make the assumption that everything

will fail, considering that most of these clusters are commodity clusters and hence have parts which break. The key

when designing the availability of the system is to ensure that it has sufficient redundancy built into it, assuming

that every component (including the entire site) may fail. Planned downtime is usually associated with scheduled

maintenance activities such as system upgrade or migration.

Unplanned downtime of the Oracle database service can be due to data loss or server failure. The data loss may

be caused by storage medium failure, data corruption, deletion of data by human error, or even data center failure.

Data loss can be a very serious failure as it may turn out to be permanent, or could take a long time to recover from.

The solutions to data loss consist of prevention methods and recovery methods. Prevention methods include disk

mirroring by RAID (Redundant Array of Independent Disks) configurations such as RAID 1 (mirroring only) and

RAID 10 (mirroring and striping) in the storage array or with ASM (Automatic Storage Management) diskgroup

redundancy setting. Chapter 5 will discuss the details of the RAID configurations and ASM configurations for Oracle

Databases. Recovery methods focus on getting the data back through database recovery from the previous database

backup or flashback recovery or switching to the standby database through Data Guard failover.

Server failure is usually caused by hardware or software failure. Hardware failure can be physical machine

component failure, network or storage connection failure; and software failure can be caused by an OS crash, or

Oracle database instance or ASM instance failure. Usually during server failure, data in the database remains intact.

After the software or hardware issue is fixed, the database service on the failed server can be resumed after completing

database instance recovery and startup. Database service downtime due to server failure can be prevented by

providing redundant database servers so that the database service can fail over in case of primary server failure.

Network and storage connection failure can be prevented by providing redundant network and storage connections.

Expert Oracle RAC 12c

Search WWH ::

Custom Search

Home