Database Reference
In-Depth Information
Example Problem
The following shows the output of the netstat command when there were reassembly errors in the server. In the
following output of netstat command, ~1.4 billion packets were received and 1.3 billion packets were sent. Also,
~100 million packets needed to be reassembled, but 1.6 million packets failed for reassembly. This output is from
OSWatcher file, and given here is just the first sample. As there were numerous reassembly failures, this server would
require further review to understand the root cause.
Ip:
1478633885 total packets received Total packets received so far.
28039 with invalid addresses
0 forwarded
0 incoming packets discarded
1385377694 incoming packets delivered Total packets sent so far
1045105164 requests sent out
6 outgoing packets dropped
57 dropped because of missing route
3455 fragments dropped after timeout
106583778 reassemblies required Reassembly required
32193658 packets reassembled ok
1666996 packet reassembles failed Failed reassembly
33504302 fragments received ok
111652305 fragments created There are many reasons for reassembly failures. More analysis is required using
tools such as tcpdump or Wireshark to understand the root cause. In most cases, troubleshooting reassembly failure
is a collaborative effort among DBAs, network administrators, and system administrators.
GC Lost Block Issue
If there is no response to a request within 0.5 seconds, then the block is declared lost and the lost block-related
statistics are incremented. Reasons for lost block error condition include CPU scheduling issues, incorrect network
buffer configuration, resource shortage in the server, network configuration issues, etc. It is a very common mistake to
declare a gc lost block issue as a network issue: you must review statistics from both database and OS perspectives to
understand the root cause.
Oracle RAC uses an algorithm shown in Figure 9-7 to detect lost blocks. First, a foreground process sends a
request to the LMS process running in remote node. Until a response is received from the LMS process, foreground
process cannot continue. So, foreground process sets up a timer with an expiry time of 0.5 seconds (in the Windows
platform, this timer expiry is set to 6 seconds) and goes to sleep. If a block/grant to read a block is received by the
foreground process, then the foreground process will continue processing. If no response is received within a
0.5-second time interval, then the alarm will wake up the foreground process, which will declare the block as lost,
and account wait time to gc cr block lost or gc current block lost . The foreground process will resubmit the
request to access for the same block to LMS process. The foreground process can get stuck waiting for a block in this
loop if there is a problem sending or receiving a block.
 
Search WWH ::




Custom Search