Managing and Optimizing a Complex RAC Environment - Expert Oracle RAC 12c

Database Reference

In-Depth Information

2013-03-17 05:25:08.821: [ CSSD][1]clssgmCleanupNodeContexts(): successful cleanup of nodes

rcfg(286)

2013-03-17 05:25:09.724: [ CSSD][56]clssnmDeactivateNode: node 04, state 5

2013-03-17 05:25:09.724: [ CSSD][56]clssnmDeactivateNode: node 04 (node04) left cluster

Node Evictions—Top/Common Causes and Factors

The following are only a few of the most common symptoms/factors that lead to node evictions, cluster stack sudden

death, reboots, and status going unhealthy:

•

Network disruption, latency, or missing network heartbeats

•

Delayed or missing disk heartbeats

•

Corrupted network packets on the network may also cause CSS reboots on certain platforms

•

Slow interconnect or failures

•

Known Oracle Clusterware bugs

•

Unable to read/write or access the majority of the voting disks (files)

•

Lack of sufficient resource (CPU/memory starvation) availability on the node for OS

scheduling by key CRS daemon processes

•

Manual termination of the critical cluster stack daemon background processes

( css, cssdagent, cssdmonitor )

•

No space left on the device for the GI or /var file system

•

Sudden death or hang of CSSD process

ORAAGENT/ORAROOTAGENT excessive resource (CPU, MEMORY, SWAP) consumption resulting in

node eviction on specific OS platforms

•

Gather Crucial Information

Consult/refer to the following various trace/log files and gather crucial information in order to diagnose/identify the

real symptoms of node eviction:

alert.log : to determine which process actually caused the reboot, refer to the cluster alter.log

under $GI_HOME/log/nodename location. The alert log provides first-hand information

to debug the root cause of the issue. Pay close attention to the component to determine

important information. If the component shows cssmoint or cssdagent , then the node is

evicted due to resource unavailability for OS scheduling. Either the CPU was 100% clocked

for a long time period or too much swapping/paging took place due to insufficient memory

availability. If it shows cssagent , then it could be due to network issues

•

ocss.log: If the node eviction happens due to network failure or latency, or voting disk issues,

refer to ocss.log file under $GI_HOME/log/nodename/cssd location

•

cssmonit/cssagent_nodename.lgl , depending on the OS you are on, either in /etc/oracle/

lastgasp or /var/adm/oracle/lastgasp

•

oracssdmonitor/oracssdagent_root , under $GI_HOME/log/nodename/agent/ohasd location

In addition to the preceding, refer to OS-specific logs

•

Expert Oracle RAC 12c

Search WWH ::

Custom Search

Home