Cost-effective data reliability assurance for data maintenance - Reliability Assurance of Big Data in the Cloud - page 58

Database Reference

In-Depth Information

Amazon Beanstalk, mainly for evaluating the metadata scanning procedure and the

proactive replica checking procedure. The structure of the experimental PRCR con-

sists of one user interface and one PRCR node, both of which run on a single Amazon

EC2 instance. Based on the minimum replication algorithm, execution times of the

metadata scanning process and the proactive replica checking task are obtained.

In the experiments, the metadata scanning procedure and the proactive replica

checking procedure are both simulated with several configurations. We hire four types

of Amazon EC2 compute instances for the management of 3000 S3 objects (i.e., data

files) stored with standard storage service and reduced redundancy storage service,

respectively. Table 6.2 shows the results of the experiments. It can be seen that the

metadata scanning time is at a magnitude of hundreds of nanoseconds, and the proac-

tive replica checking time is at a magnitude of tens of milliseconds.

6.5.1.3 Impact of PRCR on data access performance

Compared with the conventional three-replica strategy, by using PRCR, because of the

reduction in data replication, data access performance could potentially be affected. For

evaluation, we conducted data access speed tests with Amazon S3 so as to analyze the

impact of storing no more than two replicas compared with storing the conventional

three replicas. As Amazon states that “latency and throughput for reduced redundancy

storage are the same as for standard storage” ( http://docs.aws.amazon.com/AmazonS3/

latest/dev/UsingRRS.html ), we conducted data access performance tests for standard

Amazon S3 only. Specifically, we created AWS EC2 instances in different AWS re-

gions as well as used a local computer at Swinburne University of Technology in Mel-

bourne, Australia, to experiment with the data transfer speed for accessing files stored

in Amazon S3 in different regions. In addition to the location of the data source and the

data transfer target, the configuration of the data transfer tests were all the same.

The results are shown in Table 6.3 . Based on the results, a major observation can

be clearly seen, that is, data transfer within the same region is always of the highest

speed, where the data are transferred much quicker than that between different places

( > 3000 KB/s vs. < 300 KB/s). According to these results, in summary, we have the

following conclusions for the impact of using PRCR on data access performance com-

pared with the conventional three-replica strategy.

First, in the case that all replicas are stored in one region in practice as in Ama-

zon S3, as addressed earlier, according to the description of Amazon S3, latency and

Table 6.2 Metadata scanning time and proactive replica checking

time

t1.micro

m1.small

m1.large

m1.xlarge

Scanning time

≈700 ns

≈400 ns

≈700 ns

≈850 ns

checking time (standard)

≈27 ms

≈27 ms

≈30 ms

≈27 ms

checking time (reduce

redundancy)

≈25 ms

≈24 ms

≈37 ms

≈23 ms

Next Page

Reliability Assurance of Big Data in the Cloud

Search WWH ::

Custom Search

Home