Database Reference
In-Depth Information
Step 5
After n minutes, the HBA1 cable that was removed is plugged back into the interface and the cable from the second
HBA2 device connected to Node db1 is removed. Similarly, the status of the test is recorded in the spreadsheet and
one after the other, all HBA interfaces are tested.
Step 6
What if all the HBAs fail? What if there is a cascade effect on the HBA devices, that is, they start failing one after the
other? What if more than one HBA fails? All possible scenarios should be tested using similar methods.
After the individual HBA devices have been tested, it's time to simultaneously pull cables from all HBA devices
connected to Node db1: record the results (illustrated in Table 3-3 ).
Table 3-3. Test 1 Results Recorded
Test #
Host Name
Interface
Test method
Expected
Behavior
Observation /
Result
Status
Node1
Node2
Node3
1
prddb1
HBA1
Pull
Stable server
Cluster Stable
OK
UP
UP
UP
2
prddb1
HBA2
Pull
Stable Sever
Cluster Stable
OK
UP
UP
UP
3
prddb1
HBA1
and HBA2
Pull
Server panic
& node
eviction
prddbb1
Crashed
OK
CRASH
UP
UP
Please note that in Test 3 (illustrated in Table 3-3 ), when cable from both the HBA interfaces where pulled, the
server crashed. Although the status indicated “OK,” this was because the result was as expected. The Node db1 had
two HBA devices, and when cable from both the devices where pulled, the clusterware panics, which then causes a
node eviction.
Because the load testing tool uses tnsnames.ora file and the TAF policy has been set, it may be advantageous to
also record if the users failed over from db1 to one or more of the other instances in the cluster. Along with the status,
it's important to record the number of users who failed over and the time they took to failover.
For documentation and verification at a later stage, it's also a good practice to capture the log information from
the database alert logs, CRS logs, CSS logs, and the system logs ( /var/log/messages ) file.
Output from/var/log/messages
Feb 11 19:01:57 prddb1 kernel: qla2xxx 0000:0a:00.0: LOOP DOWN detected (2 3 0).
Feb 11 19:02:13 prddb1 kernel: rport-2:0-0: blocked FC remote port time out: saving binding
Feb 11 19:02:13 prddb1 kernel: rport-2:0-1: blocked FC remote port time out: saving binding
Feb 11 19:02:13 prddb1 kernel: rport-2:0-2: blocked FC remote port time out: saving binding
Feb 11 19:02:13 prddb1 kernel: rport-2:0-3: blocked FC remote port time out: saving binding
>>>>>SERVER CRASH
Feb 11 19:02:13 prddb1 kernel: Error:Mpx:Path Bus 2 Tgt 0 Lun 0 to 0F278 is dead.
Feb 11 19:02:13 prddb1 kernel: Error:Mpx:Killing bus 2 to HP OPEN 0F278 port 3A.
Feb 11 19:02:13 prddb1 kernel: Error:Mpx:Path Bus 2 Tgt 0 Lun 7 to 0F278 is dead.
Feb 11 19:02:13 prddb1 kernel: Error:Mpx:Path Bus 2 Tgt 0 Lun 6 to 0F278 is dead.
Feb 11 19:02:13 prddb1 kernel: Error:Mpx:Path Bus 2 Tgt 0 Lun 3 to 0F278 is dead.
. . . . . . . . .
. . . . . . . .
 
Search WWH ::




Custom Search