Capacity Planning and Architecture - Expert Oracle RAC Performance Diagnostics and Tuning

Database Reference

In-Depth Information

Heartbeats

Heartbeat is a pooling mechanism in clustered platforms to verify if the other server participating in the cluster is

alive. Oracle also uses the heartbeat mechanism to verify the health of the other nodes participating in the cluster. In a

four-node cluster (Figure 2-4 ), every node will poll the other node in the cluster; ORADB1 will send a heartbeat message

to ORADB2 , ORADB3 , and ORADB4 . Similarly, ORADB2 will send a heartbeat message to ORADB1, ORADB3 , and ORADB4 . This

helps each server in the cluster to understand the health of the other server in the cluster and take appropriate actions

should polling fail. In RAC, the CSS performs polling in three different methods:

•

Network Heartbeat (NHB)

•

Disk Heartbeat (DHB)

•

Local Heartbeat (LHB)

Network Heartbeat (NHB)

The NHB is sent over the private interconnect. CSS sends an NHB every second from one node to all the other nodes

in a cluster and receives an NHB from the remote nodes similarly every second. The NHB contains timestamp

information from the local node and is used by the remote. If an acknowledgment is not received from the other node

in the cluster in 30 seconds (represented by the miscount value), CSS would request a cluster reconfiguration. The

reconfiguration will not always be required. CSS will verify the health and state of the node through other methods

before making a decision for reconfiguration.

Disk Heartbeat (DHB)

Apart from the NHB, we use the DHB, which is required for split-brain resolution. It contains a timestamp of the local

time in Unix epoch seconds as well as a millisecond timer.

■ the unix epoch (or unix time or pOSiX time or unix timestamp) is the number of seconds that have elapsed

since January 1, 1970 (midnight universal time [utC]/greenwich Mean time [gMt]), not counting leap seconds

(in iSO 8601: 1970-01-01t00:00:00Z); “epoch” is often used as a synonym for “unix time.”

Note

The DHB is the definitive mechanism to make a decision about whether a node is still alive. DHB is a mechanism

where each server in the cluster will write a timestamp to the voting disk every second. In the case of NHB failure, CSS

will verify the voting disk to check if the node in question has written any timestamp to the voting disk during the NHB

missed timeframe to decide if cluster reconfiguration is required.

Unlike the NHB, there are two parameters that drive the DHB: a “long disk I/O” (LIOT) value and a “short disk

I/O” (SIOT) value. When the DHB beats are missing for too long, the node is assumed to be dead. When connectivity

to the disk is lost for too long, the disk is considered offline.

Search WWH ::

Custom Search

Home