Database Reference
In-Depth Information
current distributed data storage/management systems in both industry and academia,
which include examples such as OceanStore [6] , DataGrid [7] , Hadoop Distributed
File System [8] , Google File System [9] , Amazon S3 [10] , and so forth. In these stor-
age systems, several replicas are created for each piece of data. These replicas are
stored in different storage devices, so that the data have better chance to survive when
storage device failures occur.
In recent years, Cloud computing is emerging as the latest distributed computing
paradigm, which provides redundant, inexpensive, and scalable resources in a pay-as-
you-go fashion to meet various application requirements [11] . Since the advent of Cloud
computing in late 2007 [12] , it has fast become one of the most promising distributed
solutions in both industry and academia. Nowadays, with the rapid growth of Cloud
computing, the size of Cloud storage is expanding at a dramatic speed. It is estimated
that by 2015 the data stored in the Cloud will reach 0.8 ZB (i.e., 0.8 × 10 21 bytes or
800,000,000 TB), while more data are “touched” by the Cloud within their life cycles
[13] . For maintaining such a large amount of Cloud data, data reliability in the Cloud is
considered more important than ever before. However, due to the accelerating growth of
Cloud data, current replication-based data reliability management has become a bottle-
neck for the development of Cloud data storage. For example, storage systems such as
Amazon S3, Google File System, and Hadoop Distributed File System all adopt similar
data replication strategies called the “conventional multi-replica replication strategy,” in
which a fixed number of replicas (normally three) are stored for all data to ensure the re-
liability requirement. For storage of the huge amounts of Cloud data, these conventional
multi-replica replication strategies consume a lot of storage resources for additional rep-
licas. This could cause negative effects for both the Cloud storage providers and users.
On one hand, from the Cloud storage provider's perspective, the excessive consumption
of storage resources leads to a big storage overhead and increases the cost for providing
the storage service. On the other hand, from the Cloud storage user's perspective, ac-
cording to the pay-as-you-go pricing model, the excessive storage resource usage will fi-
nally be paid by the storage users. For data-intensive Cloud applications specifically, the
incurred excessive storage cost could be huge. Therefore, Cloud-based applications have
put forward a higher demand for cost-effective management of Cloud storage. While the
requirement of data reliability should be met in the first place, data in the Cloud needs to
be stored in a highly cost-effective manner.
1.2
Background of Cloud storage
In this section, we briefly introduce the background knowledge of Cloud storage.
First, we introduce the distinctive features of Cloud storage systems. Second, we in-
troduce the Cloud data life cycle.
1.2.1 Distinctive features of Cloud storage systems
Data reliability is closely related to the structure of the storage system and how the
storage system is being used. Different from other distributed storage systems, the
 
Search WWH ::




Custom Search