Local Naming Parameter ENABLE=BROKEN - Secrets of the Oracle Database - page 419

Database Reference

In-Depth Information

You can also use the TCP_KEEPALIVE_THRESHOLD socket option on individual appli-

cations to override the default interval so that each application can have its own

interval on each socket. The option value is an unsigned integer in milliseconds. See

also tcp(7P).

The Solaris manual goes on to state that the commitment level for the parameter is

unstable and that it should not be changed. To the best of my knowledge, tcp_keepalive_

threshold is not implemented in ORACLE DBMS software. Instead of modifying keep-alive

settings, the Solaris documentation recommends changing re-transmit time-outs ( tcp_

rexmit_interval_max and tcp_ip_abort_interval ).

A while back, my own testing with Oracle8 i confirmed that ENABLE=BROKEN is functional

and useful given that tcp_keepalive_interval and tcp_keepalive_abort_interval are adjusted as

needed. Tracing with truss showed that it is still implemented in Oracle10 g . Keep in mind that

an appropriate test for all of these TCP/IP settings consists of either pulling the network cable

(and keeping it pulled), switching off the server, or any other method of bringing down the

operating system (Stop+A on Solaris), such that it does not stand a chance of sending a message to

a remote system to indicate that sockets should be closed. Such a test should be performed on

any RAC cluster before it moves into production.

Before I move off topic any further, let me explain why ENABLE=BROKEN should be considered an

outdated feature. With Oracle10 g Clusterware and virtual IP addresses the IP address that went

down on a failed host is brought back online on a surviving node. Re-transmits by the client

should then be redirected to a surviving node and fail, since it knows nothing about the sockets

that were open on the failed node. As part of virtual IP address (VIP) failover, Oracle Cluster-

ware flushes the address resolution protocol (ARP) cache, which translates between IP addresses

and MAC (medium access control) addresses of ethernet adapters. This is undocumented, but

is essential in accomplishing successful reconnects by database clients, which must become

aware of the new mapping between IP address and MAC address. On Linux, the ARP cache is

flushed by executing the command /sbin/arping -q -U -c 3 -I adapter ip_address in the

script $ORA_CRS_HOME/bin/racgvip . The mapping between MAC and IP addresses may be displayed

with the command arp -a on UNIX as well as Windows.

Additional information on the subject of TCP/IP and failover is in Metalink note 249213.1.

According to the note, Sun Microsystems suggests setting tcp_keepalive_interval , tcp_ip_

abort_cinterval (prevents connect attempts to the failed node from waiting up to three minutes),

and tcp_ip_abort_interval (by default, a connection is closed after not receiving an acknowl-

edgment for eight minutes). Unfortunately, the Metalink note does not state that tcp_keepalive_

interval is ignored, unless SO_KEEPALIVE is set on the socket, which in turn requires ENABLE=BROKEN .

I recommend adjusting tcp_ip_abort_cinterval to prevent connections initiated before

the virtual IP address has come back online on a surviving node from locking up for up to three

minutes. I also suggest reducing the values of the parameters tcp_ip_abort_interval and tcp_

rexmit_interval_max to 45 seconds (default: 8 minutes) and 30 seconds (default: 3 minutes)

respectively. These parameters must be changed on the database client machine—remember,

Next Page

Secrets of the Oracle Database

Search WWH ::

Custom Search

Home