Testing for Availability - Expert Oracle RAC Performance Diagnostics and Tuning

Database Reference

In-Depth Information

eth21:1 Link encap:Ethernet HWaddr 00:22:64:0E:8F:D3

inet addr:169.254.21.103 Bcast:169.254.63.255 Mask:255.255.192.0

UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1

Interrupt:98

eth21:2 Link encap:Ethernet HWaddr 00:22:64:0E:8F:D3

inet addr:169.254.193.238 Bcast:169.254.255.255 Mask:255.255.192.0

UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1

Interrupt:98

eth21:3 Link encap:Ethernet HWaddr 00:22:64:0E:8F:D3

inet addr:169.254.91.154 Bcast:169.254.127.255 Mask:255.255.192.0

UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1

Interrupt:98

eth21:4 Link encap:Ethernet HWaddr 00:22:64:0E:8F:D3

inet addr:169.254.134.16 Bcast:169.254.191.255 Mask:255.255.192.0

UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1

Interrupt:98

What if the server had four of five NICs configured as private interconnects? In this case, when a NIC that has an

assigned HAIP fails, the HAIP is assigned or moved to the new NIC on the node.

Node Failure

RAC comprises two or more instances sharing a single copy of a physical database. Prior to Oracle Database 11g

Release 2, each instance is normally attached or configured to run on a specific node. This type of configuration where

the instance is physically mapped to specific nodes in the cluster is called admin managed configuration . Starting with

Oracle Database 11g Release 2, Oracle provides another method of database configuration where the instances can

optionally not be mapped to any specific server in the cluster. This feature is called pool-managed configuration .

In a RAC configuration, if the node fails, the instance that includes Global Cache state (GCS) objects stored in

the buffer cache and the shared pool, as well as the GCS processes on that node, will fail. Under such circumstances,

the GCS must reconfigure itself to re-master the locks that were being managed by the failed node before instance

recovery can occur. During reconfiguration process, the global buffer cache locks are replayed to produce a consistent

state of the memory buffers.

Many cluster hardware vendors use a disk-based quorum system that allows each node to determine which other

nodes are currently active members of the cluster. These systems also allow a node to remove itself from the cluster or

to remove other nodes from the cluster. The latter is accomplished through a type of voting system, managed through

the shared quorum disk, that allows nodes to determine which node will remain active if one or more of the nodes

become disconnected from the cluster interconnect.

Using OCW places RAC in a more advantageous position because RAC uses both methods for failure detection.

While the OCW maintains the heartbeat functions using the cluster interconnect (called network heartbeat —NHB),

the OCW will also verify against the voting disk (called disk heartbeat —DHB) to check if the node has lost

communication. Using the heartbeat mechanism, the OCW is able to verify the health of the other node members

participating in the cluster. At each heartbeat, every member node gives its status to the other members. If they all

agree, nothing further is done until the next heartbeat. If two or more nodes report a different configuration (e.g., the

cluster interconnect is broken between a pair of nodes), then one member arbitrates among the different members in

the cluster.

When a node, or the communication to a node, fails, the NHB between the two nodes in question is not successful.

After waiting until the time-out period (defined by the misscount parameter), one of the remaining nodes detects the

failure and attempts to reform the cluster. If the remaining nodes in the cluster are able to form a quorum, the OCW

will reorganize the cluster membership.

Expert Oracle RAC Performance Diagnostics and Tuning

Search WWH ::

Custom Search

Home