Database Reference
In-Depth Information
Top 5 Timed Foreground Events
Event
Waits
Time(s)
Avg wait (ms)
% DB timeWait Class
gc cr block lost
557
293
526
62.54
Cluster
DB CPU
110
26
12
7
23.52
5.44
2.52
1.44
db file sequential read
3,806
1,154
5,400
7
10
1
User I/O
Commit
Other
log file sync
DFS lock handle
Figure 9-8. Top five timed foreground events
Blocks can be lost due to numerous reasons such as network drops, latencies in CPU scheduling, incorrect
network configuration, and insufficient network bandwidth. When a process is scheduled to run on a CPU, that
process will drain socket buffers to application buffers. If there are delays in CPU scheduling, then the process might
not be able to read socket buffers quick enough. If the incoming rate of packets to the socket buffers for that process is
higher, then the kernel threads might encounter “buffer full” condition, and kernel threads will drop incoming packets
silently. So, all necessary IP packets may not arrive, leading to UDP segment reassembly failures.
If there are lost block issues and if the DB time spent on lost block issue is not negligible, then you should identify
if there was any CPU starvation in the server during the problem duration. I have seen severe latch contention causing
higher CPU usage leading to lost block issues. Also, kernel threads have higher CPU priority than the User Foreground
processes, so, even if the foreground processes are suffering from CPU starvation, kernel threads can receive the
network frames from the physical link layer and fail to copy the frames to socket buffers due to “buffer full” conditions.
Network kernel parameters must be configured properly if the workload induces a higher rate of cluster
interconnect traffic. For example, rmem_max and wmem_max kernel parameters (Linux) might need to be increased
to reduce the probability of buffer full conditions.
Memory starvation is another issue that can trigger a slew of lost bock waits. Memory starvation can lead to
swapping or paging in the server affecting the performance of foreground processes. So, foreground processes will
be inefficient in draining socket buffers and might not be able to drain network buffers quick enough, leading to
”buffer full” conditions. On the Linux platform, HugePages is an important feature to implement; without HugePages
configured, kernel threads can consume all available CPUs during memory starvation, thus causing the foreground
process to suffer from CPU starvation and lost blocks issues.
In summary, identify if the DB time spent for lost block waits is significant. Check if the resource consumption
in the database server(s) is good, verify that kernel parameters related to network are properly configured, and verify
if the routes between the nodes are optimal. If resolving these problems does not resolve the root cause, then review
link/switch layer statistics to verify if there are any errors.
Configuring Network for Oracle RAC and Clusterware
As we discussed at the beginning of this chapter, Oracle RAC and Oracle Grid Infrastructure require three types of
network configurations: public network, private network, and storage network. The public network is primarily for the
database clients to connect to the Oracle database. Also, the public network may connect the database servers with
the company corporate network, through which applications and other database clients can connect to the database.
The private network is the interconnect network just among the nodes on the cluster. It carries the cluster interconnect
heartbeats for the Oracle Clusterware and the communication between the RAC nodes for Oracle RAC cache fusion.
 
Search WWH ::




Custom Search