Database Reference
In-Depth Information
In the preceding output, (formatted to fit the page) the critical lines to be looked at are those with “fatal” in
the message.
node ssky3l12p2 (2) at 50% heartbeat fatal , removal in 14.020 seconds
node ssky3l12p2 (2) is impending reconfig, flag 2228224, misstime 15980
local diskTimeout set to 27000 ms, remote disk timeout set to 27000, impending reconfig status(1)
node ssky3l12p2 (2) at 75% heartbeat fatal , removal in 7.020 seconds
node ssky3l12p2 (2) at 90% heartbeat fatal , removal in 2.010 seconds, seedhbimpd 1
Removal started for node ssky3l12p2 (2), flags 0x220000, state 3, wt4c 0
node(2) inactive
The NHB failure and corresponding node eviction is a countdown process—from 50% fatal to 90% fatal—before
the node is finally removed from the cluster. If within this time, the NHB is restored, the node eviction is canceled and
the cluster returns to normal function.
Node Eviction Due to Missing DHB
Normally, DHB is a continuation of the NHB error condition (split network) and is considered as a second check to
verify the health of the node that is not responding. In this case, the timestamp of the NHB is compared to the recent
DHB time to determine if the node is still alive, i.e., the short I/O timeout (SIOT). The following output (formatted to
fit the page) is extracted from the cssd.log file. The entries in the log file are visible when the CSSD logging is set to
level 3 and is searching for “ CheckSplit ” string.
cat ocssd.log | grep CheckSplit.
The other situation is the split-brain scenario in which the voting disk does not respond to the timestamp write
from a node. This happens when there are two or more nodes with no communication between them. In Figure 16-1 ,
showing a four-node cluster, oradb1 and oradb2 can communicate with each other and oradb3 and oradb4 are able
 
Search WWH ::




Custom Search