Database Reference
In-Depth Information
600000
500000
400000
bond0
haip
300000
200000
100000
0
1
2
3
# of Interconnects
Figure 14-12. Scalability HAIP vs. NIC bonding
Interconnect Monitoring
In the previous workshop, we discussed the poor performance of the interconnect; and by making changes at the
infrastructure level, we optimized the performance. Ideally the fix should have been with making the SQL queries
more efficient. In almost all cases of RAC implementations, GigE interconnect has been more than sufficient for
handling normal data transfer traffic. It's seldom that a higher bandwidth network would be required. However, there
are always exceptions to the situation.
Monitoring and troubleshooting of the interconnect activity does not just revolve around high latency issues; or
to put it in another perspective, it may revolve around performance issues due to other reasons. Almost always this
is what is normally observed from the various production implementations. This does not mean monitoring of the
interconnect is not required, but interconnect should not be the primary focus when the RAC cluster shows slower
performance. Similar to what we have discussed in Chapter 13, when performance issues are noticed, thoughts
should be outside the interconnect. That having been said, issues surrounding the interconnect have primarily been
with capacity and congestions that cause dropped messages and or buffer overflows.
Workshop
One of the production clusters had seen several issues such as slow performance and occasional node evictions over
the past several months. A production cluster running smoothly, giving extremely good performance all of a sudden
has issues and causes real concerns to the DBA team. On analyzing the history of events to the servers, the only
change the DBAs could recollect making to the production servers since the go live several years ago was the data
center move from one city to another.
Checking through various statistics, the DBAs noticed that there was a significantly high amount of lost blocks.
The amount of lost blocks where so high, it significantly affected production. The following output gives you the lost
blocks for just one day of up time:
 
 
Search WWH ::




Custom Search