Backup and Recovery - Oracle Exadata Recipes: A Problem-Solution Approach

Databases Reference

In-Depth Information

Performance disk statuses by way of its built-in monitoring. Exadata Storage Servers monitor disk drives and collect

information such as temperate, read/write errors, speed, and performance.

If a disk shows a predictive failure condition, it means that the server has experienced one or more read/write

error conditions, temperature threshold conditions, and so forth; this indicates that a disk failure could be imminent.

In this case, you should replace your disk using the same procedures outlined in the Solution of this recipe.

When a disk reports a poor performance condition, it should also be replaced using the steps provided in this

recipe. Each Exadata cell disk should exhibit the same performance characteristics and if one is performing poorly

based on performance metrics collected by the storage server, it could impact you database performance adversely.

In the case of a physical disk failure, Oracle automatically changes the physicaldisk and lun statuses change

from normal to critical . It then drops the celldisk and each griddisk on the celldisk . When the grid disk or disks

are dropped, ASM will drop its corresponding grid disks using the FORCE option as displayed from the ASM instance's

alert log:

SQL> /* Exadata Auto Mgmt: Proactive DROP ASM Disk */

alter diskgroup RECO_CM01 drop

disk RECO_CD_05_CM01CEL01 force

NOTE: GroupBlock outside rolling migration privileged region

NOTE: requesting all-instance membership refresh for group=3

Tue Jul 05 21:48:13 2011

NOTE: Attempting voting file refresh on diskgroup DBFS_DG

GMON updating for reconfiguration, group 3 at 28 for pid 35, osid 12377

NOTE: group 3 PST updated.

NOTE: membership refresh pending for group 3/0x833f0667 (RECO_CM01)

WARNING: Disk 35 (_DROPPED_0035_RECO_CM01) in group 3 will be dropped in: (12960) secs on ASM inst 1

GMON querying group 3 at 29 for pid 19, osid 11535

SUCCESS: refreshed membership for 3/0x833f0667 (RECO_CM01)

SUCCESS: /* Exadata Auto Mgmt: Proactive DROP ASM Disk */

alter diskgroup RECO_CM01 drop

disk RECO_CD_05_CM01CEL01 force

When a physical disk enters a predictive failure state, the physicaldisk and lun statuses change from

normal to predictive failure . After this, the celldisk and each griddisk on the celldisk are dropped. When this

happens, the ASM disks are dropped without the FORCE option.

When the failed disk is replaced, the following things occur:

•

The firmware on the new disk is updated to reflect the same firmware version as the other

disks in the cell.

•

The cell disk is recreated to match the disk it replaced.

•

The replacement

celldisk is brought online and its status is set to normal .

•

Each

griddisk on the celldisk is onlined and has its status marked active .

The grid disks will automatically be added to Oracle ASM, resynchronized, and brought online

•

in its ASM disk group.

If you look in your ASM instance's alert log after replacing a failed disk, you will see messages similar to the

following ones. These are examples of Exadata's automatic disk management capability:

SQL> /* Exadata Auto Mgmt: ADD ASM Disk in given FAILGROUP */

alter diskgroup DATA_CM01 add

failgroup CM01CEL01

disk 'o/192.168.10.3/DATA_CD_05_cm01cel01'

Oracle Exadata Recipes: A Problem-Solution Approach

Search WWH ::

Custom Search

Home