Minimum replication for meeting the data reliability requirement - Reliability Assurance of Big Data in the Cloud

Database Reference

In-Depth Information

In order to simplify the computation of equation (5.7) , our solution contains two

major steps:

• First, based on the discrete disk failure rate pattern that is applied in our generic data re li-

ability model, the average disk failure rate can be converted into a piecewise function λ ()

of storage duration. According to the disk failure rate pattern of the disk and the start time of

the storage period, the average disk failure rate can be calculated by following a piecewise

function containing n subfunctions, in which n is the number of different disk failure rates

contained in the disk failure rate pattern after the start time. By doing this, equation (5. 7)

is transformed into an equation in which t is the only independent variable, with variable λ

being eliminated.

• Second, after the first conversion of equation (5.7) , the previous equation has now been

converted into a piecewise function, which equals to several functions, each covering a spe-

cific period of storage duration. Due to the increment in the number of equations that need

to be solved to obtain the longest storage duration value, the solving process is still time

consuming and expensive in terms of overhead. To optimize the performance of the solv-

ing process, the data reliability equation is further simplified for reducing the computat io n

complexity. It is observed that th e curve of data reliability with a single replica (i.e.,

e t )

changes almost linearly when λ t is in a certain ra nge. Therefore, in this value range, the

curve can be substituted by a straight line with λ t being the dependent variable without

sacrificin g much ac curacy of the result. Assuming that the function of the substituted straight

line is λ=+

λ

ftat b

()

, equation (5.7) can be simplified into equation (5.8) :

(1)1 (1

=− −−−−

atbatb

t

λ

)(1

λ

)

1

2

RA

(5.8)

k

As the average disk failure rate can be expressed as a first-degree piecewise func-

tion of t , equation (5.8) is essentially a quartic function of t . Compared to many com-

plicated equation-solving methods, such as trust-region equation-solving algorithms

[86] , for solving the original nonpolynomial equation (5.7) , the simplified equation

(5.8) can be solved by the methods for solving polynomial equations, which are much

more efficient, and hence the overhead calculation can be significantly reduced.

In addition to the simplification described earlier, addressing the issue of solving

the equation for multiple times, optimizations are also conducted. In order to avoid

any excessive overhead incurred for solving equation (5.8) for multiple times, the

multiple calculations are conducted in one go when the data file is first created in the

Cloud. As long as replicas of the data file are not lost, the solving process does not

need to be conducted again, and hence resulting in better efficiency.

In Chapter 6 , the minimum replication calculation approach is applied for our ge-

neric data reliability assurance mechanism where we present the pseudo code of the

approach with the mechanism together then.

5.2

Minimum replication benchmark

By solving the corresponding inequations and equations mentioned in Section 5.1 , the

minimum replication, that is, the minimum number of replicas required for meeting

the data reliability requirement is determined. In addition to find the minimum number

Reliability Assurance of Big Data in the Cloud

Search WWH ::

Custom Search

Home