Testing for Availability - Expert Oracle RAC Performance Diagnostics and Tuning

Database Reference

In-Depth Information

When a server or node crashes, all database components configured on that server or node are also prone for

failure. For example, in the previous discussions (Table 3-3 ), if Node prddb1 crashes because of the interface to the

storage subsystem fails, this will cause ASM on that server to fail, which will trigger the database instance on that

server to also fail. To take the real potential of the RAC's features, such failures should be made transparent to the user

and should minimize transaction loss.

Such interruptions can be avoided by adopting and implementing fast application notification (FAN) and/

or transparent application failover (TAF) functionality. OCW has been architected with a built-in functionality that

provides three levels of proactive failover and notification methods:

1.

The OCW will automatically fail over any services registered with it to another node or

instance based on the definitions in the OCR. Services and resources can be registered

with the OCW using Oracle Enterprise Manager (OEM) and srvctl.

2.

The OCW will use the Oracle notification services (ONS) to proactively notify the

participating client machines of any state changes by sending DOWN and UP FAN events. The

applications using Oracle call interface (OCI) calls interpret these events to proactively

react to these situations by sending/routing new connections to the new destinations.

3.

Using the policy managed configuration, rules can be defined across server pools. This

is done by maintaining minimum/maximum number of instances in a pool. When a

member in a pool fails and the pool is running short on the number of members required,

members from another pool are automatically provisioned (provided all the pool

management rules are met), and instances started, to support system availability and

throughput requirements.

A service is an abstraction layer of a single system image executed against the same database with common

functionality, quality expectations, and priority relative to other services. Examples of services could be payroll,

accounts payable, order entry, and so on.

TAF allows client applications to continue working after the application loses its connection to the database.

Although users may experience a brief pause during the time the database server fails over to a surviving cluster node,

the session context is preserved. If configured using TAF, after the instance failover and database recovery completes,

the application can automatically reconnect to one of the surviving instances and continue operations as if no failure

had occurred.

■

Note

fan, fCf, and taf are discussed in detail in Chapter 15.

RAP Phase II—Availability and Load Balancing

Once the various components of the cluster are found to be stable from RAP Phase I testing, the project can go as

planned for importing the database and data from the current production environment.

Once the database has been configured and the parameters set to match the current production, the next

phase of RAP testing should be planned. The goal of this test is to verify the application behavior when one, more,

or all instances crash within the cluster. How will the database tier provide business continuum when one or more

components of the database fail? Is the application able to handle such failures? What happens to the user workload:

did they notice the failure? What happens when there are media failures and the application is not able to retrieve or

persist data into the database? All these questions are answered by business requirements. Phase III validates if these

requirements are met.

Expert Oracle RAC Performance Diagnostics and Tuning

Search WWH ::

Custom Search

Home