Minimum replication for meeting the data reliability requirement - Reliability Assurance of Big Data in the Cloud

Database Reference

In-Depth Information

of replicas for data storage in our data reliability assurance solution, the minimum

replication could also be used as a benchmark for evaluating different approaches.

It shows the theoretical minimum data redundancy level of a replication-based data

storage system without jeopardizing the data reliability requirement. By using this

benchmark, the cost-effectiveness as well as the ability of providing data reliability

assurance of a replication-based data storage system can be clearly presented as de-

scribed next.

Given data file set F ( f 1 , f 2 , f 3 , … , f m ) managed by replication-based system S ( d 1 , d 2 ,

d 3 , … , d n ) with the data reliability requirement set of RR ( r 1 , r 2 , r 3 , … , r m ), where f i ( r i 1 ,

r i 2 , r i 3 , … , r ip ) indicates a data file in F and d q indicates a disk in S. r i j ( d q ) indicates

the j th replica of f i , which is stored in disk d q. In order to avoid searching the disks for

storing all the replicas of the data file, the disk failure rate patterns are obtained from

randomly selected disks. For each, apply the minimum replication approach for each

f i in F , and the minimum replication min i for each f i , can be obtained. The minimum

replication level for storing data file set F can be described as equation (5.9) :

∑

m

min

i

MIN

=

i

=

1

(5.9)

S

m

When the current replication level in system S is close to MIN S , it means that the

data stored in the system are maintained cost effectively. However, when the current

replication level is lower than MIN S , it means that the data redundancy level of the

system is too low to provide sufficient data reliability assurance, so that the data reli-

ability requirement could be jeopardized.

5.3

Evaluation of the minimum replication

calculation approach

In this section, we briefly present the results of our evaluation on the minimum rep-

lication calculation approach so as to provide an intuitive understanding of the ef-

fectiveness of the approach. The evaluation is conducted by running a minimum rep-

lication algorithm. The algorithm is essentially the implementation of the minimum

replication approach, which runs as a part of our data reliability assurance mechanism

to be presented in Chapter 6 . As the minimum replication algorithm is described in

Chapter 6 , details of the experiments will be presented in Chapter 6 as well.

During the evaluation we evaluate the algorithm under different data reliability re-

quirements and with different configurations including failure rate types and calcula-

tion equations. The evaluation is conducted from the aspects of execution time of the

algorithm and the accuracy rate of the output of the optimized algorithm compared with

the original algorithm (see Section 6.5 for more details). The execution time of the al-

gorithm addresses the computing overhead of the minimum replication calculation ap-

proach, while the accuracy rate of the algorithm output addresses the effectiveness of our

optimization to the minimum replication calculation approach presented in Section 5.1 .

Reliability Assurance of Big Data in the Cloud

Search WWH ::

Custom Search

Home