Cost-effective data reliability assurance for data maintenance - Reliability Assurance of Big Data in the Cloud - page 59

Database Reference

In-Depth Information

Table 6.3 Transfer speed for accessing data in Amazon S3

Source

target

Oregon

(KB/s)

Ireland

(KB/s)

Singapore

(KB/s)

Sydney

(KB/s)

Local

(KB/s)

Oregon

3372

170

184

172

86

Ireland

231

3284

211

36

54

Singapore

190

209

3466

202

107

Sydney

137

110

230

3205

224

throughput of standard storage and reduced redundancy storage are the same. This in

fact means that on Amazon S3, the data access performance for standard storage - us-

ing three replicas - is the same for reduced redundancy storage - using fewer than

three replicas and sacrifices reliability. For PRCR, which uses no more than two rep-

licas without jeopardizing reliability, it would have the same data access performance

with Amazon S3. Therefore, by using PRCR, there is no performance degradation in

general.

Second, in the case that all replicas are stored in different regions in theory, there

would be some impact on performance for some users. For example, on Amazon S3,

if three replicas are stored in different regions such as Oregon, Ireland, and Sydney in

a traditional manner while in PRCR only two replicas are stored in regions such as Or-

egon and Ireland (i.e., no replica stored in Sydney), there would be performance deg-

radation to some users, such as Australia, in accessing data because they would suffer

slower performance than either Oregon or Ireland (at 86 KB/s or 54 KB/s respectively

from Swinburne) in comparison to faster access to Sydney (at 224 KB/s from Swin-

burne). Similarly, the impact on single replicas can be analyzed. To solve this issue,

on one hand, research in another area on data placement can be conducted to minimize

the performance impact, for example, the replica accessed least and/or slowest can be

eliminated. On the other hand, in the case that access performance for certain data is

the ultimate goal, the extra replica(s) can be added at the extra cost, which would not

jeopardize the effectiveness of PRCR for data reliability.

6.5.2 Cost-effectiveness of PRCR

The cost-effectiveness of PRCR in managing a large number of data files is evaluated.

There are two major costs incurred for managing data files with PRCR: the running

overhead of PRCR and the cost for storing data replicas.

6.5.2.1 Running overhead

First, the running overhead of PRCR is evaluated. Our major concern is about what

proportion the running overhead takes from the total cost for each data file. For the

huge number of Cloud data files, PRCR nodes would normally be well loaded. The

running overhead of each data file can be derived by dividing the total PRCR running

cost by the maximum capacity of PRCR nodes.

Next Page

Reliability Assurance of Big Data in the Cloud

Search WWH ::

Custom Search

Home