Database Reference
In-Depth Information
Split-Brain Scenarios and How to Avoid Them
Split-brain scenarios are a RAC DBA or DMA's worst nightmare. They are synonymous with a few interchangeable
terms: node evictions, fencing, STONITH (Shoot the Node in the Head), etc.
The term split-brain scenario originates from its namesake in the medical world: split-brain syndrome, which
is a consequence of the connection between the two hemispheres of the brain being severed, results in significant
impairment of normal brain function. Split-brain scenarios in a RAC cluster can be defined as functional overlapping
of separate instances, each in its own world without any guaranteed consistency or coordination between them.
Basically, communication is lost among the nodes of a cluster, with them being evicted/kicked out of the cluster,
resulting in node/instance reboots. This can happen due to a variety of reasons ranging from hardware failure on the
private cluster interconnect to nodes becoming unresponsive because of CPU/memory starvation issues.
The next section covers the anatomy of node evictions in detail.
Split-brain situations generally translate into a painful experience for the business and IT organizations, mainly
because of the unplanned downtime and the corresponding impact on the business involved. Figure 7-9 shows a
typical split-brain scenario:
Figure 7-9. Typical split-brain scenario within an Oracle RAC cluster
Here are some best practices that you can employ to eliminate or at least mitigate potential split-brain scenarios,
which also serve as good performance tuning tips and tricks as well (a well-tuned system has significantly less
potential for overall general failure, including split-brain conditions):
Establish redundancy at the networking tier: redundant switches/Network Interface Cards
(NICs) trunked/bonded/teamed together; the failover of the componentry must be thoroughly
tested, such that the failover times do not exceed communication thresholds in place for RAC
split-brain scenarios to occur.
Allocate enough CPU for the application workloads and establish limits for CPU consumption.
Basically, CPU exhaustion can lead to a point where the node becomes unresponsive to the
other nodes of the cluster, resulting in a split-brain scenario leading in turn to node evictions.
 
Search WWH ::




Custom Search