Network Practices - Expert Oracle RAC 12c

Database Reference

In-Depth Information

Example Problem

The following shows the output of the netstat command when there were reassembly errors in the server. In the

following output of netstat command, ~1.4 billion packets were received and 1.3 billion packets were sent. Also,

~100 million packets needed to be reassembled, but 1.6 million packets failed for reassembly. This output is from

OSWatcher file, and given here is just the first sample. As there were numerous reassembly failures, this server would

require further review to understand the root cause.

Ip:

1478633885 total packets received  Total packets received so far.

28039 with invalid addresses

0 forwarded

0 incoming packets discarded

1385377694 incoming packets delivered  Total packets sent so far

1045105164 requests sent out

6 outgoing packets dropped

57 dropped because of missing route

3455 fragments dropped after timeout

106583778 reassemblies required  Reassembly required

32193658 packets reassembled ok

1666996 packet reassembles failed  Failed reassembly

33504302 fragments received ok

111652305 fragments created There are many reasons for reassembly failures. More analysis is required using

tools such as tcpdump or Wireshark to understand the root cause. In most cases, troubleshooting reassembly failure

is a collaborative effort among DBAs, network administrators, and system administrators.

GC Lost Block Issue

If there is no response to a request within 0.5 seconds, then the block is declared lost and the lost block-related

statistics are incremented. Reasons for lost block error condition include CPU scheduling issues, incorrect network

buffer configuration, resource shortage in the server, network configuration issues, etc. It is a very common mistake to

declare a gc lost block issue as a network issue: you must review statistics from both database and OS perspectives to

understand the root cause.

Oracle RAC uses an algorithm shown in Figure 9-7 to detect lost blocks. First, a foreground process sends a

request to the LMS process running in remote node. Until a response is received from the LMS process, foreground

process cannot continue. So, foreground process sets up a timer with an expiry time of 0.5 seconds (in the Windows

platform, this timer expiry is set to 6 seconds) and goes to sleep. If a block/grant to read a block is received by the

foreground process, then the foreground process will continue processing. If no response is received within a

0.5-second time interval, then the alarm will wake up the foreground process, which will declare the block as lost,

and account wait time to gc cr block lost or gc current block lost . The foreground process will resubmit the

request to access for the same block to LMS process. The foreground process can get stuck waiting for a block in this

loop if there is a problem sending or receiving a block.

Search WWH ::

Custom Search

Home