Managing and Optimizing a Complex RAC Environment - Expert Oracle RAC 12c

Database Reference

In-Depth Information

This is the most common cause of node evictions. Place a cap to limit CPU consumption and

usage through sophisticated and easy-to-use workload management technologies like DBRM,

instance caging, cluster-managed database services, etc.

•

Allocate enough memory for the various applications and establish limits for memory

consumption. Automatic Memory Management comes in handy, as it puts limits on both

SGA and Program Global Area (PGA) areas. With 11g, if you are using Automatic Shared

Memory Management(ASMM) on HUGEPAGES within the Linux family of OS (AMM is not

compatible with HUGEPAGES), this can prove to be a bit of a challenge, especially if you are

dealing with applications that tend to be ill-behaved. With 12c, a new parameter called

PGA_AGGREGATE_LIMIT has been introduced to rectify this problem, which caps the total

amount of PGA that an instance uses.

•

Employ/deploy DBRM along with IORM (if you are operating RAC clusters on Exadata).

In the absence of resource consumption limits, a single rogue user with one or more runaway

queries (typically in the ad-hoc query world) can run away with all your resources, leaving the

system starved for CPU/memory in an unresponsive state; This will automatically lead to node

evictions/split-brain scenarios occurring repeatedly. After the instances/nodes have been

rebooted, the same jobs can queue again with the same behavior being repeated over and over

again. This point has been alluded to in the preceding points as well.

•

Set up and configure instance caging (CPU_COUNT parameter) for multi-tenant database

RAC nodes; monitor and watch out for RESMGR:CPU quantum waits related to instance

caging, especially when instances have been overconsolidated, for example, within an Exadata

environment. If RESMGR:CPU quantum waits are observed, the dynamic CPU_COUNT

parameter can temporarily be increased to relieve the pressure points in a RAC instance,

provided enough CPU is available for all the instances within the RAC node.

•

Ensure that any kind of antivirus software is

not active/present on any of the RAC nodes of

the cluster. This can interfere with the internal workings of the LMS and other RAC processes

and try to block normal activity by them; in turn, this can result in excessive usage of CPU,

ultimately being driven all the way to 100% CPU consumption, resulting in RAC nodes being

unresponsive and thereby evicted from the cluster.

•

Patch to the latest versions of the Oracle Database software. Many bugs have been associated

with various versions that are known to cause split-brain scenarios to occur. Staying current

with the latest CPU/PSUs is known to mitigate stability/performance issues.

•

Avoid allocating/configuring an excessive no. of LMS_PROCESSES. LMS is a CPU-intensive

process, and if not configured properly, can cause CPU starvation to occur very rapidly,

ultimately resulting in node evictions.

•

Partition large objects to reduce I/O and improve overall performance. This eases the load on

CPU/memory consumption, resulting in more efficient use of resources, thereby mitigating

CPU/memory starvation scenarios that ultimately result in unresponsive nodes.

•

Parallelization and AUTO DOP: Set up/configure/tune carefully. Turning on Automatic

Degree of Parallelism (AUTO DOP—PARALLEL_DEGREE_POLICY= AUTO) can have negative

consequences on RAC performance, especially if the various PARALLEL parameters are not

set up and configured properly. For example, an implicit feature of AUTO DOP is in-memory

parallel execution, which qualifies large objects for direct path reads, which in the case of

ASMM (PGA_AGGREGATE_TARGET) can translate into unlimited use of the PGA, ultimately

resulting in memory starvation; nodes then end up being unresponsive and finally get evicted

from the cluster. High settings of PARALLEL_MAX_SERVERS can have a very similar effect of

memory starvation. The preceding are just a few examples, underscoring the need for careful

configuration of PARALLELIZATION init.ora parameters within a RAC cluster.

Expert Oracle RAC 12c

Search WWH ::

Custom Search

Home