Testing for Availability - Expert Oracle RAC Performance Diagnostics and Tuning

Database Reference

In-Depth Information

Step 5

After n minutes, the HBA1 cable that was removed is plugged back into the interface and the cable from the second

HBA2 device connected to Node db1 is removed. Similarly, the status of the test is recorded in the spreadsheet and

one after the other, all HBA interfaces are tested.

Step 6

What if all the HBAs fail? What if there is a cascade effect on the HBA devices, that is, they start failing one after the

other? What if more than one HBA fails? All possible scenarios should be tested using similar methods.

After the individual HBA devices have been tested, it's time to simultaneously pull cables from all HBA devices

connected to Node db1: record the results (illustrated in Table 3-3 ).

Table 3-3. Test 1 Results Recorded

Test #

Host Name

Interface

Test method

Expected

Behavior

Observation /

Result

Status

Node1

Node2

Node3

1

prddb1

HBA1

Pull

Stable server

Cluster Stable

OK

UP

2

prddb1

HBA2

Pull

Stable Sever

Cluster Stable

OK

UP

3

prddb1

HBA1

and HBA2

Pull

Server panic

& node

eviction

prddbb1

Crashed

OK

CRASH

UP

Please note that in Test 3 (illustrated in Table 3-3 ), when cable from both the HBA interfaces where pulled, the

server crashed. Although the status indicated “OK,” this was because the result was as expected. The Node db1 had

two HBA devices, and when cable from both the devices where pulled, the clusterware panics, which then causes a

node eviction.

Because the load testing tool uses tnsnames.ora file and the TAF policy has been set, it may be advantageous to

also record if the users failed over from db1 to one or more of the other instances in the cluster. Along with the status,

it's important to record the number of users who failed over and the time they took to failover.

For documentation and verification at a later stage, it's also a good practice to capture the log information from

the database alert logs, CRS logs, CSS logs, and the system logs ( /var/log/messages ) file.

Output from/var/log/messages

Feb 11 19:01:57 prddb1 kernel: qla2xxx 0000:0a:00.0: LOOP DOWN detected (2 3 0).

Feb 11 19:02:13 prddb1 kernel: rport-2:0-0: blocked FC remote port time out: saving binding

Feb 11 19:02:13 prddb1 kernel: rport-2:0-1: blocked FC remote port time out: saving binding

Feb 11 19:02:13 prddb1 kernel: rport-2:0-2: blocked FC remote port time out: saving binding

Feb 11 19:02:13 prddb1 kernel: rport-2:0-3: blocked FC remote port time out: saving binding

>>>>>SERVER CRASH

Feb 11 19:02:13 prddb1 kernel: Error:Mpx:Path Bus 2 Tgt 0 Lun 0 to 0F278 is dead.

Feb 11 19:02:13 prddb1 kernel: Error:Mpx:Killing bus 2 to HP OPEN 0F278 port 3A.

Feb 11 19:02:13 prddb1 kernel: Error:Mpx:Path Bus 2 Tgt 0 Lun 7 to 0F278 is dead.

Feb 11 19:02:13 prddb1 kernel: Error:Mpx:Path Bus 2 Tgt 0 Lun 6 to 0F278 is dead.

Feb 11 19:02:13 prddb1 kernel: Error:Mpx:Path Bus 2 Tgt 0 Lun 3 to 0F278 is dead.

. . . . . . . . .

. . . . . . . .

Expert Oracle RAC Performance Diagnostics and Tuning

Search WWH ::

Custom Search

Home