Database Reference
In-Depth Information
2013-03-17 05:25:08.821: [ CSSD][1]clssgmCleanupNodeContexts(): successful cleanup of nodes
rcfg(286)
2013-03-17 05:25:09.724: [ CSSD][56]clssnmDeactivateNode: node 04, state 5
2013-03-17 05:25:09.724: [ CSSD][56]clssnmDeactivateNode: node 04 (node04) left cluster
Node Evictions—Top/Common Causes and Factors
The following are only a few of the most common symptoms/factors that lead to node evictions, cluster stack sudden
death, reboots, and status going unhealthy:
•
Network disruption, latency, or missing network heartbeats
•
Delayed or missing disk heartbeats
•
Corrupted network packets on the network may also cause CSS reboots on certain platforms
•
Slow interconnect or failures
•
Known Oracle Clusterware bugs
•
Unable to read/write or access the majority of the voting disks (files)
•
Lack of sufficient resource (CPU/memory starvation) availability on the node for OS
scheduling by key CRS daemon processes
•
Manual termination of the critical cluster stack daemon background processes
(
css, cssdagent, cssdmonitor
)
•
No space left on the device for the GI or /var file system
•
Sudden death or hang of CSSD process
ORAAGENT/ORAROOTAGENT
excessive resource (CPU, MEMORY, SWAP) consumption resulting in
node eviction on specific OS platforms
•
Gather Crucial Information
Consult/refer to the following various trace/log files and gather crucial information in order to diagnose/identify the
real symptoms of node eviction:
alert.log
: to determine which process actually caused the reboot, refer to the cluster
alter.log
under
$GI_HOME/log/nodename
location. The alert log provides first-hand information
to debug the root cause of the issue. Pay close attention to the component to determine
important information. If the component shows
cssmoint
or
cssdagent
, then the node is
evicted due to resource unavailability for OS scheduling. Either the CPU was 100% clocked
for a long time period or too much swapping/paging took place due to insufficient memory
availability. If it shows
cssagent
, then it could be due to network issues
•
•
ocss.log:
If the node eviction happens due to network failure or latency, or voting disk issues,
refer to
ocss.log
file under
$GI_HOME/log/nodename/cssd
location
•
cssmonit/cssagent_nodename.lgl
, depending on the OS you are on, either in
/etc/oracle/
lastgasp
or
/var/adm/oracle/lastgasp
•
oracssdmonitor/oracssdagent_root
, under
$GI_HOME/log/nodename/agent/ohasd
location
In addition to the preceding, refer to OS-specific logs
•