Generic data reliability model in the cloud - Reliability Assurance of Big Data in the Cloud

Database Reference

In-Depth Information

tested a sample of 1000 disks for a period of 1000 hours (i.e., 41.5 days), and within that

period of time, one disk failure occurred. According to the equation, the MTTF value is

1,000,000 hours. From the reader's point of view, the MTTF value that equals 114 years

would be hard to understand because no single disk could survive for that long. In contrast,

the representation of AFR is by percentage, which indicates the expected probability of disk

failure occurrence during 1 year of usage. For the MTTF value of 1,000,000 hours, according

to equation (2.1) in Section 2.1 , the equivalent AFR value is 0.87%, meaning that 0.87% of

all the disks are expected to fail during 1 year of usage. Compared with MTTF , the advantage

of AFR on readability can be easily seen.

• Second, as mentioned in Section 2.1 , MTTF is obtained in the industrial test by running

many disks for a specific period of time. On the contrary, AFR is obtained from the real

scenario by checking the running history of disks in the system via system logs. Therefore,

the AFR value could better reflect the actual reliability level of disks in a real storage system.

In addition, much existing research conducted by industry researchers applies AFR for disk

reliability evaluation. In this topic, results from existing industrial research are well investi-

gated and applied in our evaluation as well.

Based on the AFR disk reliability metric, the data reliability is presented in the

similar style. In our novel reliability model, the data reliability is described in the form

of annual survival rate, which indicates the proportion of the data that survives during

1 year storage.

4.1.2 Data reliability model type

As mentioned in Section 2.2 , two types of data reliability model have been spotted

among all the literature reviewed, which are based on simple permutations and com-

binations and complicated Markov chains, respectively. In this topic, we apply the

former to our novel data reliability model due to the following two reasons.

• First, by using the design based on simple permutations and combinations, the variable disk

failure rate can be added into the model relatively easily compared with the Markov chain

type. In existing Markov chain reliability models that we have reviewed, the disk failure

rates are all considered as a constant [4,19,60] . The complexity of the models could be one

of the major reasons for this. To solve the extremely complicated functions of the Markov

chain reliability model, many complex matrix operations are involved, which could incur

large computing overhead. Although we have not tested the complexity of solving a Markov

chain reliability model with variable failure rates, we can foresee that the complexity of

solving it could be substantially increased, which is not desirable for our data reliability as-

surance mechanism.

• Second, in our research we pursue reduction on the number of replicas stored for the Cloud

data. As will be mentioned later, in our data reliability assurance mechanism, we only store

no more than two replicas for each piece of Cloud data. Therefore, the data reliability model

based on simple permutations and combinations is sufficient for doing the job, 1 while build-

ing the complicated state diagram of the Markov chain reliability model for analyzing a very

high data redundancy level becomes unnecessary.

1 In fact, as will be explained later, our novel data reliability model is also able to describe the reliability of

data with more replicas.

Reliability Assurance of Big Data in the Cloud

Search WWH ::

Custom Search

Home