Introduction - Reliability Assurance of Big Data in the Cloud

Database Reference

In-Depth Information

Introduction

1

With the rapid growth in the size of Cloud data, cost-effective data storage has become

one of the key issues in Cloud research, yet the reliability of the huge amounts of

Cloud data needs to be fully assured. In this topic, we investigate the trade-off of cost-

effective data storage and data reliability assurance in the Cloud. The novel research

stands from the Cloud storage service providers' perspective and investigates the issue

on how to provide cost-effective data storage service while meeting the data reliability

requirement throughout the whole Cloud data life cycle. This topic is important and

has a practical value to Cloud computing technology. Especially, for data-intensive

applications that are of data-intensive characteristics, our research could dramatically

reduce its storage cost while meeting the data reliability requirement and hence has a

positive impact on promoting the deployment of the Cloud.

This chapter introduces the background knowledge and key issues of this research.

It is organized as follows. Section 1.1 gives the definition of data reliability and briefly

introduces current data reliability assurance technologies in the Cloud. Section 1.2

introduces the background knowledge related to Cloud storage. Section 1.3 outlines

the key issues of the research. Finally, Section 1.4 presents an overview for the topic

structure.

1.1

Data reliability in the Cloud

The term “reliability” is widely used as an aspect of the service quality provided by

hardware, systems, Web services, etc. In Standard TL9000, it is defined as “the abil-

ity of an item to perform a required function under stated conditions for a stated time

period” [1] . For data reliability specifically, which refers to the reliability provided

by the data storage services and systems for the stored data, it can be defined as “the

probability of the data surviving in the system for a given period of time” [2] . While

the term “data reliability” is sometimes used in the industry as a superset of data avail-

ability and various other topics, in this topic we will stick to the definition of data

reliability given earlier.

Data reliability indicates the ability of the storage system to keep data consistent,

hence it is always one of the key metrics of a data storage/management system. In

large-scale distributed systems, due to the big quantity of storage devices being used,

failures of storage devices occur frequently [3] . Therefore, the importance of data reli-

ability is prominent, and these systems need better design and management to cope

with frequent failures. Increasing the data redundancy level could be a good way for

increasing data reliability [4,5] . Among several major approaches for increasing the

data redundancy level, data replication is currently the most popular approach in dis-

tributed storage systems. At present, data replication has been widely adopted in many

Search WWH ::

Custom Search

Home