Tuning the Cluster Interconnect - Expert Oracle RAC Performance Diagnostics and Tuning

Database Reference

In-Depth Information

600000

500000

400000

bond0

haip

300000

200000

100000

0

1

2

3

# of Interconnects

Figure 14-12. Scalability HAIP vs. NIC bonding

Interconnect Monitoring

In the previous workshop, we discussed the poor performance of the interconnect; and by making changes at the

infrastructure level, we optimized the performance. Ideally the fix should have been with making the SQL queries

more efficient. In almost all cases of RAC implementations, GigE interconnect has been more than sufficient for

handling normal data transfer traffic. It's seldom that a higher bandwidth network would be required. However, there

are always exceptions to the situation.

Monitoring and troubleshooting of the interconnect activity does not just revolve around high latency issues; or

to put it in another perspective, it may revolve around performance issues due to other reasons. Almost always this

is what is normally observed from the various production implementations. This does not mean monitoring of the

interconnect is not required, but interconnect should not be the primary focus when the RAC cluster shows slower

performance. Similar to what we have discussed in Chapter 13, when performance issues are noticed, thoughts

should be outside the interconnect. That having been said, issues surrounding the interconnect have primarily been

with capacity and congestions that cause dropped messages and or buffer overflows.

Workshop

One of the production clusters had seen several issues such as slow performance and occasional node evictions over

the past several months. A production cluster running smoothly, giving extremely good performance all of a sudden

has issues and causes real concerns to the DBA team. On analyzing the history of events to the servers, the only

change the DBAs could recollect making to the production servers since the go live several years ago was the data

center move from one city to another.

Checking through various statistics, the DBAs noticed that there was a significantly high amount of lost blocks.

The amount of lost blocks where so high, it significantly affected production. The following output gives you the lost

blocks for just one day of up time:

Search WWH ::

Custom Search

Home