Managing and Optimizing a Complex RAC Environment - Expert Oracle RAC 12c

Database Reference

In-Depth Information

2013-03-17 05:25:09.842

[cssd(7335)]CRS-1601:CSSD Reconfiguration complete. Active nodes are node01,node02,node03.

2013-03-17 05:37:53.185

[cssd(7335)]CRS-1601:CSSD Reconfiguration complete. Active nodes are node01,node02,node03.

The preceding extract is from the node01 alert.log file of 4-node cluster setup, which indicates about node04

eviction. You may discover from the preceding input that the first warning appeared when the outgoing node missed

50% of timeout interval, followed by 75 and 90% missing warning messages. The reboot advisory message represents

the component details that actually initiated the node eviction. In the preceding text, it was the cssmmonit that

triggered the node eviction due to lack of memory on the node, pertaining to the example demonstrated over here.

Here is the continuous extract from the ocssd.log file from a surviving node, node01, about node04 eviction:

2013-03-17 05:24:53.928: [ CSSD][53]clssnmPollingThread: node node04 (04) at 50% heartbeat fatal,

removal in 14.859 seconds

2013-03-17 05:24:53.928: [ CSSD][53]clssnmPollingThread: node node04 (04) is impending reconfig,

flag 461838, misstime 15141

2013-03-17 05:24:53.938: [ CSSD][53]clssnmPollingThread: local diskTimeout set to 27000 ms,

remote disk timeout set to 27000, impending reconfig status(1)

2013-03-17 05:24:53.938: [ CSSD][40]clssnmvDHBValidateNCopy: node 04, node04, has a disk HB,

but no network HB, DHB has rcfg 287, wrtcnt, 77684032, LATS 4262516131, lastSeqNo 0, uniqueness

1356505641, timestamp 1363487079/4262228928

2013-03-17 05:24:53.938: [ CSSD][49]clssnmvDHBValidateNCopy: node 04, node04, has a disk HB,

but no network HB, DHB has rcfg 287, wrtcnt, 77684035, LATS 4262516131, lastSeqNo 0, uniqueness

1356505641, timestamp 1363487079/4262228929

2013-03-17 05:24:54.687: [ CSSD][54]clssnmSendingThread: sending status msg to all nodes

2013-03-17 05:25:02.028: [ CSSD][53]clssnmPollingThread: node node04 (04) at 75% heartbeat fatal,

removal in 6.760 seconds

2013-03-17 05:25:02.767: [ CSSD][54]clssnmSendingThread: sending status msg to all nodes

2013-03-17 05:25:06.068: [ CSSD][53]clssnmPollingThread: node node04 (04) at 90% heartbeat fatal,

removal in 2.720 seconds, seedhbimpd 1

2013-03-17 05:25:06.808: [ CSSD][54]clssnmSendingThread: sending status msg to all nodes

2013-03-17 05:25:06.808: [ CSSD][54]clssnmSendingThread: sent 4 status msgs to all nodes

In the preceding, the clssnmPollingThread of CSS daemon process thread, whose responsibility is to scan/verify

the active nodes members, periodically reports about the node04 missing heartbeat and also represents the

timestamp of node eviction.

The following text from the ocssd.log on node01 exhibits the details about the node being evicted, node

cleanup, node leaving cluster, and rejoining sequence:

2013-03-17 05:25:08.807: [ CSSD][53]clssnmPollingThread: Removal started for node node04 (14),

flags 0x70c0e, state 3, wt4c 02013-03-17 05:25:08.807: [CSSD][53]clssnmMarkNodeForRemoval: node 04,

node04 marked for removal

2013-03-17 05:25:08.807: [ CSSD][53]clssnmDiscHelper: node04, node(04) connection failed, endp

(00000000000020c8), probe(0000000000000000), ninf->endp 00000000000020c8

2013-03-17 05:25:08.807: [ CSSD][53]clssnmDiscHelper: node 04 clean up, endp (00000000000020c8),

init state 5, cur state 5

2013-03-17 05:25:08.807: [GIPCXCPT][53] gipcInternalDissociate: obj 600000000246e210

[00000000000020c8] { gipcEndpoint : localAddr '

2013-03-17 05:25:08.821: [ CSSD][1]clssgmCleanupNodeContexts(): cleaning up nodes, rcfg(286)

Expert Oracle RAC 12c

Search WWH ::

Custom Search

Home