Network Practices - Expert Oracle RAC 12c

Database Reference

In-Depth Information

Top 5 Timed Foreground Events

Event

Waits

Time(s)

Avg wait (ms)

% DB timeWait Class

gc cr block lost

557

293

526

62.54

Cluster

DB CPU

110

26

12

7

23.52

5.44

2.52

1.44

db file sequential read

3,806

1,154

5,400

7

10

1

User I/O

Commit

Other

log file sync

DFS lock handle

Figure 9-8. Top five timed foreground events

Blocks can be lost due to numerous reasons such as network drops, latencies in CPU scheduling, incorrect

network configuration, and insufficient network bandwidth. When a process is scheduled to run on a CPU, that

process will drain socket buffers to application buffers. If there are delays in CPU scheduling, then the process might

not be able to read socket buffers quick enough. If the incoming rate of packets to the socket buffers for that process is

higher, then the kernel threads might encounter “buffer full” condition, and kernel threads will drop incoming packets

silently. So, all necessary IP packets may not arrive, leading to UDP segment reassembly failures.

If there are lost block issues and if the DB time spent on lost block issue is not negligible, then you should identify

if there was any CPU starvation in the server during the problem duration. I have seen severe latch contention causing

higher CPU usage leading to lost block issues. Also, kernel threads have higher CPU priority than the User Foreground

processes, so, even if the foreground processes are suffering from CPU starvation, kernel threads can receive the

network frames from the physical link layer and fail to copy the frames to socket buffers due to “buffer full” conditions.

Network kernel parameters must be configured properly if the workload induces a higher rate of cluster

interconnect traffic. For example, rmem_max and wmem_max kernel parameters (Linux) might need to be increased

to reduce the probability of buffer full conditions.

Memory starvation is another issue that can trigger a slew of lost bock waits. Memory starvation can lead to

swapping or paging in the server affecting the performance of foreground processes. So, foreground processes will

be inefficient in draining socket buffers and might not be able to drain network buffers quick enough, leading to

”buffer full” conditions. On the Linux platform, HugePages is an important feature to implement; without HugePages

configured, kernel threads can consume all available CPUs during memory starvation, thus causing the foreground

process to suffer from CPU starvation and lost blocks issues.

In summary, identify if the DB time spent for lost block waits is significant. Check if the resource consumption

in the database server(s) is good, verify that kernel parameters related to network are properly configured, and verify

if the routes between the nodes are optimal. If resolving these problems does not resolve the root cause, then review

link/switch layer statistics to verify if there are any errors.

Configuring Network for Oracle RAC and Clusterware

As we discussed at the beginning of this chapter, Oracle RAC and Oracle Grid Infrastructure require three types of

network configurations: public network, private network, and storage network. The public network is primarily for the

database clients to connect to the Oracle database. Also, the public network may connect the database servers with

the company corporate network, through which applications and other database clients can connect to the database.

The private network is the interconnect network just among the nodes on the cluster. It carries the cluster interconnect

heartbeats for the Oracle Clusterware and the communication between the RAC nodes for Oracle RAC cache fusion.

Expert Oracle RAC 12c

Search WWH ::

Custom Search

Home