Database Reference
In-Depth Information
Heartbeats
Heartbeat is a pooling mechanism in clustered platforms to verify if the other server participating in the cluster is
alive. Oracle also uses the heartbeat mechanism to verify the health of the other nodes participating in the cluster. In a
four-node cluster (Figure 2-4 ), every node will poll the other node in the cluster; ORADB1 will send a heartbeat message
to ORADB2 , ORADB3 , and ORADB4 . Similarly, ORADB2 will send a heartbeat message to ORADB1, ORADB3 , and ORADB4 . This
helps each server in the cluster to understand the health of the other server in the cluster and take appropriate actions
should polling fail. In RAC, the CSS performs polling in three different methods:
Network Heartbeat (NHB)
Disk Heartbeat (DHB)
Local Heartbeat (LHB)
Network Heartbeat (NHB)
The NHB is sent over the private interconnect. CSS sends an NHB every second from one node to all the other nodes
in a cluster and receives an NHB from the remote nodes similarly every second. The NHB contains timestamp
information from the local node and is used by the remote. If an acknowledgment is not received from the other node
in the cluster in 30 seconds (represented by the miscount value), CSS would request a cluster reconfiguration. The
reconfiguration will not always be required. CSS will verify the health and state of the node through other methods
before making a decision for reconfiguration.
Disk Heartbeat (DHB)
Apart from the NHB, we use the DHB, which is required for split-brain resolution. It contains a timestamp of the local
time in Unix epoch seconds as well as a millisecond timer.
the unix epoch (or unix time or pOSiX time or unix timestamp) is the number of seconds that have elapsed
since January 1, 1970 (midnight universal time [utC]/greenwich Mean time [gMt]), not counting leap seconds
(in iSO 8601: 1970-01-01t00:00:00Z); “epoch” is often used as a synonym for “unix time.”
Note
The DHB is the definitive mechanism to make a decision about whether a node is still alive. DHB is a mechanism
where each server in the cluster will write a timestamp to the voting disk every second. In the case of NHB failure, CSS
will verify the voting disk to check if the node in question has written any timestamp to the voting disk during the NHB
missed timeframe to decide if cluster reconfiguration is required.
Unlike the NHB, there are two parameters that drive the DHB: a “long disk I/O” (LIOT) value and a “short disk
I/O” (SIOT) value. When the DHB beats are missing for too long, the node is assumed to be dead. When connectivity
to the disk is lost for too long, the disk is considered offline.
 
 
Search WWH ::




Custom Search