Motivating example and problem analysis - Reliability Assurance of Big Data in the Cloud

Database Reference

In-Depth Information

approach in the Cloud. The data reliability model should be able to describe the reli-

ability of the Cloud data throughout their life cycles, in which they are stored with dif-

ferent redundancy levels and stored on different storage devices with different failure

rate patterns in different stages, respectively.

To facilitate our research, our data reliability model should be consistent with the

preceding analysis conducted as well as the literature reviews conducted in Chapter 2 .

Therefore, first, from the hardware aspect, our data reliability model should be able to

precisely describe the relationship between data reliability and the failure pattern of

storage devices. As we mentioned in Section 2.1 , storage device failure is the source

of storage failure and data loss. Precise description of the impact of storage devices to

data reliability could substantially improve the ability of the model to predict data reli-

ability, that is, the data loss rate, after the data are stored for a certain period of time.

Second, the data reliability model must be able to describe the reliability of Cloud

data stored in the form of replicas. The number of replicas represents the redundancy

level of the data. In the data reliability model, the relationship between data reliability

level and the number of replicas needs to be reflected. Third, in order to describe the

reliability of Cloud data throughout their life cycles, the model must be able to reflect

the changes in replica number, that is, data redundancy level, so as to correspond to the

life cycle stages of data creation, data maintenance, and data recovery.

3.2.4.2 Minimum replication calculation and benchmark

When metadata such as data size, expected data storage duration, and data reliability

requirements are collected and the corresponding storage device is determined, the in-

terface between the Cloud and the storage user, if necessary, needs to determine the

minimum replica number that is needed for the purpose of creating data replicas. The

calculation should be fast and of low overhead. Moreover, in order to facilitate the data

maintenance mechanism, it is necessary that the minimum replication calculation ap-

proach also predicts the reliability of the data that are stored for a certain period of time.

However, with a variable disk failure rate pattern, the overhead of such a calculation

could be a concern, and hence optimization needs to be conducted to reduce the over-

head of the data reliability prediction process.

3.2.4.3 Cost-effective data reliability assurance mechanism

For the maintenance of the Cloud data throughout the Cloud data life cycle, we need

to design a data reliability assurance mechanism that could replace the conventional

three-replica data storage strategy in current Clouds. There are three major challenges

as follows for the design of a cost-effective data reliability assurance mechanism in

the Cloud.

• First, the mechanism should be running in a cost-effective fashion so that the Cloud data

storage cost can be reduced. This requires not only the reduction of replica number, but also

the overhead incurred for conducting the mechanism to be considered.

• Second, the mechanism should be able to effectively utilize the computation and storage

power of the Cloud, so that the big data in the Cloud could be managed properly.

Search WWH ::

Custom Search

Home