Database Reference
In-Depth Information
Table 6.3 Transfer speed for accessing data in Amazon S3
Source
target
Oregon
(KB/s)
Ireland
(KB/s)
Singapore
(KB/s)
Sydney
(KB/s)
Local
(KB/s)
Oregon
3372
170
184
172
86
Ireland
231
3284
211
36
54
Singapore
190
209
3466
202
107
Sydney
137
110
230
3205
224
throughput of standard storage and reduced redundancy storage are the same. This in
fact means that on Amazon S3, the data access performance for standard storage - us-
ing three replicas - is the same for reduced redundancy storage - using fewer than
three replicas and sacrifices reliability. For PRCR, which uses no more than two rep-
licas without jeopardizing reliability, it would have the same data access performance
with Amazon S3. Therefore, by using PRCR, there is no performance degradation in
general.
Second, in the case that all replicas are stored in different regions in theory, there
would be some impact on performance for some users. For example, on Amazon S3,
if three replicas are stored in different regions such as Oregon, Ireland, and Sydney in
a traditional manner while in PRCR only two replicas are stored in regions such as Or-
egon and Ireland (i.e., no replica stored in Sydney), there would be performance deg-
radation to some users, such as Australia, in accessing data because they would suffer
slower performance than either Oregon or Ireland (at 86 KB/s or 54 KB/s respectively
from Swinburne) in comparison to faster access to Sydney (at 224 KB/s from Swin-
burne). Similarly, the impact on single replicas can be analyzed. To solve this issue,
on one hand, research in another area on data placement can be conducted to minimize
the performance impact, for example, the replica accessed least and/or slowest can be
eliminated. On the other hand, in the case that access performance for certain data is
the ultimate goal, the extra replica(s) can be added at the extra cost, which would not
jeopardize the effectiveness of PRCR for data reliability.
6.5.2 Cost-effectiveness of PRCR
The cost-effectiveness of PRCR in managing a large number of data files is evaluated.
There are two major costs incurred for managing data files with PRCR: the running
overhead of PRCR and the cost for storing data replicas.
6.5.2.1 Running overhead
First, the running overhead of PRCR is evaluated. Our major concern is about what
proportion the running overhead takes from the total cost for each data file. For the
huge number of Cloud data files, PRCR nodes would normally be well loaded. The
running overhead of each data file can be derived by dividing the total PRCR running
cost by the maximum capacity of PRCR nodes.
 
 
Search WWH ::




Custom Search