Database Reference
In-Depth Information
eth21:1 Link encap:Ethernet HWaddr 00:22:64:0E:8F:D3
inet addr:169.254.21.103 Bcast:169.254.63.255 Mask:255.255.192.0
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
Interrupt:98
eth21:2 Link encap:Ethernet HWaddr 00:22:64:0E:8F:D3
inet addr:169.254.193.238 Bcast:169.254.255.255 Mask:255.255.192.0
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
Interrupt:98
eth21:3 Link encap:Ethernet HWaddr 00:22:64:0E:8F:D3
inet addr:169.254.91.154 Bcast:169.254.127.255 Mask:255.255.192.0
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
Interrupt:98
eth21:4 Link encap:Ethernet HWaddr 00:22:64:0E:8F:D3
inet addr:169.254.134.16 Bcast:169.254.191.255 Mask:255.255.192.0
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
Interrupt:98
What if the server had four of five NICs configured as private interconnects? In this case, when a NIC that has an
assigned HAIP fails, the HAIP is assigned or moved to the new NIC on the node.
Node Failure
RAC comprises two or more instances sharing a single copy of a physical database. Prior to Oracle Database 11g
Release 2, each instance is normally attached or configured to run on a specific node. This type of configuration where
the instance is physically mapped to specific nodes in the cluster is called admin managed configuration . Starting with
Oracle Database 11g Release 2, Oracle provides another method of database configuration where the instances can
optionally not be mapped to any specific server in the cluster. This feature is called pool-managed configuration .
In a RAC configuration, if the node fails, the instance that includes Global Cache state (GCS) objects stored in
the buffer cache and the shared pool, as well as the GCS processes on that node, will fail. Under such circumstances,
the GCS must reconfigure itself to re-master the locks that were being managed by the failed node before instance
recovery can occur. During reconfiguration process, the global buffer cache locks are replayed to produce a consistent
state of the memory buffers.
Many cluster hardware vendors use a disk-based quorum system that allows each node to determine which other
nodes are currently active members of the cluster. These systems also allow a node to remove itself from the cluster or
to remove other nodes from the cluster. The latter is accomplished through a type of voting system, managed through
the shared quorum disk, that allows nodes to determine which node will remain active if one or more of the nodes
become disconnected from the cluster interconnect.
Using OCW places RAC in a more advantageous position because RAC uses both methods for failure detection.
While the OCW maintains the heartbeat functions using the cluster interconnect (called network heartbeat —NHB),
the OCW will also verify against the voting disk (called disk heartbeat —DHB) to check if the node has lost
communication. Using the heartbeat mechanism, the OCW is able to verify the health of the other node members
participating in the cluster. At each heartbeat, every member node gives its status to the other members. If they all
agree, nothing further is done until the next heartbeat. If two or more nodes report a different configuration (e.g., the
cluster interconnect is broken between a pair of nodes), then one member arbitrates among the different members in
the cluster.
When a node, or the communication to a node, fails, the NHB between the two nodes in question is not successful.
After waiting until the time-out period (defined by the misscount parameter), one of the remaining nodes detects the
failure and attempts to reform the cluster. If the remaining nodes in the cluster are able to form a quorum, the OCW
will reorganize the cluster membership.
 
Search WWH ::




Custom Search