Database Reference
In-Depth Information
2013-03-17 05:25:09.842
[cssd(7335)]CRS-1601:CSSD Reconfiguration complete. Active nodes are node01,node02,node03.
2013-03-17 05:37:53.185
[cssd(7335)]CRS-1601:CSSD Reconfiguration complete. Active nodes are node01,node02,node03.
The preceding extract is from the node01 alert.log file of 4-node cluster setup, which indicates about node04
eviction. You may discover from the preceding input that the first warning appeared when the outgoing node missed
50% of timeout interval, followed by 75 and 90% missing warning messages. The reboot advisory message represents
the component details that actually initiated the node eviction. In the preceding text, it was the cssmmonit that
triggered the node eviction due to lack of memory on the node, pertaining to the example demonstrated over here.
Here is the continuous extract from the ocssd.log file from a surviving node, node01, about node04 eviction:
2013-03-17 05:24:53.928: [ CSSD][53]clssnmPollingThread: node node04 (04) at 50% heartbeat fatal,
removal in 14.859 seconds
2013-03-17 05:24:53.928: [ CSSD][53]clssnmPollingThread: node node04 (04) is impending reconfig,
flag 461838, misstime 15141
2013-03-17 05:24:53.938: [ CSSD][53]clssnmPollingThread: local diskTimeout set to 27000 ms,
remote disk timeout set to 27000, impending reconfig status(1)
2013-03-17 05:24:53.938: [ CSSD][40]clssnmvDHBValidateNCopy: node 04, node04, has a disk HB,
but no network HB, DHB has rcfg 287, wrtcnt, 77684032, LATS 4262516131, lastSeqNo 0, uniqueness
1356505641, timestamp 1363487079/4262228928
2013-03-17 05:24:53.938: [ CSSD][49]clssnmvDHBValidateNCopy: node 04, node04, has a disk HB,
but no network HB, DHB has rcfg 287, wrtcnt, 77684035, LATS 4262516131, lastSeqNo 0, uniqueness
1356505641, timestamp 1363487079/4262228929
2013-03-17 05:24:54.687: [ CSSD][54]clssnmSendingThread: sending status msg to all nodes
2013-03-17 05:25:02.028: [ CSSD][53]clssnmPollingThread: node node04 (04) at 75% heartbeat fatal,
removal in 6.760 seconds
2013-03-17 05:25:02.767: [ CSSD][54]clssnmSendingThread: sending status msg to all nodes
2013-03-17 05:25:06.068: [ CSSD][53]clssnmPollingThread: node node04 (04) at 90% heartbeat fatal,
removal in 2.720 seconds, seedhbimpd 1
2013-03-17 05:25:06.808: [ CSSD][54]clssnmSendingThread: sending status msg to all nodes
2013-03-17 05:25:06.808: [ CSSD][54]clssnmSendingThread: sent 4 status msgs to all nodes
In the preceding, the clssnmPollingThread of CSS daemon process thread, whose responsibility is to scan/verify
the active nodes members, periodically reports about the node04 missing heartbeat and also represents the
timestamp of node eviction.
The following text from the ocssd.log on node01 exhibits the details about the node being evicted, node
cleanup, node leaving cluster, and rejoining sequence:
2013-03-17 05:25:08.807: [ CSSD][53]clssnmPollingThread: Removal started for node node04 (14),
flags 0x70c0e, state 3, wt4c 02013-03-17 05:25:08.807: [CSSD][53]clssnmMarkNodeForRemoval: node 04,
node04 marked for removal
2013-03-17 05:25:08.807: [ CSSD][53]clssnmDiscHelper: node04, node(04) connection failed, endp
(00000000000020c8), probe(0000000000000000), ninf->endp 00000000000020c8
2013-03-17 05:25:08.807: [ CSSD][53]clssnmDiscHelper: node 04 clean up, endp (00000000000020c8),
init state 5, cur state 5
2013-03-17 05:25:08.807: [GIPCXCPT][53] gipcInternalDissociate: obj 600000000246e210
[00000000000020c8] { gipcEndpoint : localAddr '
2013-03-17 05:25:08.821: [ CSSD][1]clssgmCleanupNodeContexts(): cleaning up nodes, rcfg(286)
 
Search WWH ::




Custom Search