Database Reference
In-Depth Information
This is the most common cause of node evictions. Place a cap to limit CPU consumption and
usage through sophisticated and easy-to-use workload management technologies like DBRM,
instance caging, cluster-managed database services, etc.
Allocate enough memory for the various applications and establish limits for memory
consumption. Automatic Memory Management comes in handy, as it puts limits on both
SGA and Program Global Area (PGA) areas. With 11g, if you are using Automatic Shared
Memory Management(ASMM) on HUGEPAGES within the Linux family of OS (AMM is not
compatible with HUGEPAGES), this can prove to be a bit of a challenge, especially if you are
dealing with applications that tend to be ill-behaved. With 12c, a new parameter called
PGA_AGGREGATE_LIMIT has been introduced to rectify this problem, which caps the total
amount of PGA that an instance uses.
Employ/deploy DBRM along with IORM (if you are operating RAC clusters on Exadata).
In the absence of resource consumption limits, a single rogue user with one or more runaway
queries (typically in the ad-hoc query world) can run away with all your resources, leaving the
system starved for CPU/memory in an unresponsive state; This will automatically lead to node
evictions/split-brain scenarios occurring repeatedly. After the instances/nodes have been
rebooted, the same jobs can queue again with the same behavior being repeated over and over
again. This point has been alluded to in the preceding points as well.
Set up and configure instance caging (CPU_COUNT parameter) for multi-tenant database
RAC nodes; monitor and watch out for RESMGR:CPU quantum waits related to instance
caging, especially when instances have been overconsolidated, for example, within an Exadata
environment. If RESMGR:CPU quantum waits are observed, the dynamic CPU_COUNT
parameter can temporarily be increased to relieve the pressure points in a RAC instance,
provided enough CPU is available for all the instances within the RAC node.
Ensure that any kind of antivirus software is
not active/present on any of the RAC nodes of
the cluster. This can interfere with the internal workings of the LMS and other RAC processes
and try to block normal activity by them; in turn, this can result in excessive usage of CPU,
ultimately being driven all the way to 100% CPU consumption, resulting in RAC nodes being
unresponsive and thereby evicted from the cluster.
Patch to the latest versions of the Oracle Database software. Many bugs have been associated
with various versions that are known to cause split-brain scenarios to occur. Staying current
with the latest CPU/PSUs is known to mitigate stability/performance issues.
Avoid allocating/configuring an excessive no. of LMS_PROCESSES. LMS is a CPU-intensive
process, and if not configured properly, can cause CPU starvation to occur very rapidly,
ultimately resulting in node evictions.
Partition large objects to reduce I/O and improve overall performance. This eases the load on
CPU/memory consumption, resulting in more efficient use of resources, thereby mitigating
CPU/memory starvation scenarios that ultimately result in unresponsive nodes.
Parallelization and AUTO DOP: Set up/configure/tune carefully. Turning on Automatic
Degree of Parallelism (AUTO DOP—PARALLEL_DEGREE_POLICY= AUTO) can have negative
consequences on RAC performance, especially if the various PARALLEL parameters are not
set up and configured properly. For example, an implicit feature of AUTO DOP is in-memory
parallel execution, which qualifies large objects for direct path reads, which in the case of
ASMM (PGA_AGGREGATE_TARGET) can translate into unlimited use of the PGA, ultimately
resulting in memory starvation; nodes then end up being unresponsive and finally get evicted
from the cluster. High settings of PARALLEL_MAX_SERVERS can have a very similar effect of
memory starvation. The preceding are just a few examples, underscoring the need for careful
configuration of PARALLELIZATION init.ora parameters within a RAC cluster.
Search WWH ::




Custom Search