Sizing Exadata - Oracle Exadata Recipes: A Problem-Solution Approach

Databases Reference

In-Depth Information

For this discussion, we will define a few probability definitions:

•

P(surv) = The probability of survival in the event of single drive failure. In other words, the

probability of not losing access to data

•

n = The number of disks that comprise a failure set. With Oracle ASM on systems with

“many disks,” you will have eight (8) partner disks for each mirror, whether the disk group is

configured with normal or high redundancy—the difference between the two is the number of

mirrors, not the number of partner disks.

• l = Rate of failure of a disk drive. This is the inverse of the drive's published Mean Time

Between Failure (MTBF).

Trepair = Time to repair a failed drive

Our formulas for measuring probability of survival, which is a measure of risk of data loss, can be expressed as

the following:

•

ASM Normal Redundancy:

P(surv) = exp(-n* l *Trepair)

P(surv) = (1+n* l *Trepair) * exp(-n* l *Trepair)

If we consider independent disk drive failures and use a 1,000,000-hour failure rate and a 24-hour time to repair a

failed disk, our probability of survival with ASM normal redundancy is the following:

•

ASM High Redundancy:

P(surv) = exp(-n* l *Trepair)

= exp(-8 * (1/1000000) * 24)

= 99.98%

With ASM high redundancy, our survival probability:

P(surv) = (1+n l Trepair) * exp(-n l Trepair)

= (1+8*(1/1000000)*24) ( exp(-8 * (1/1000000) * 24)

= 99.99%

If you now consider a potential accelerated failure rate for disk drives, which often is a more realistic scenario

considering environmental reasons for failure, let's see what our probabilities of survival look like when our MTBF is

once per month. In the example below, considering a failure rate of once per month:

With ASM normal redundancy:

P(surv) = exp(-n* l *Trepair)

= exp(-8 * (1/720) * 24)

= 76.59%

With ASM high redundancy:

P(surv) = (1+n* l *Trepair) * exp(-n* l *Trepair)

= (1+8*(1/720)*24) ( exp(-8 * (1/720) * 24)

= 97.01%

As you can see, accelerate failure rates yield much lower survival probabilities than independent failure rates.

Furthermore, ASM disk groups configured with high redundancy offer much better protection in an accelerated

failure rate scenario as compared to normal redundancy ASM disk groups.

Oracle Exadata Recipes: A Problem-Solution Approach

Search WWH ::

Custom Search

Home