Database Reference
In-Depth Information
the number of the two types of data files reaches a staggering 1:41, and the ratio
between the sizes of two types of data files is about 2.34:1. Compared with the two-
replica plan, more than 95% of replicas with 23% of the total data size are reduced.
Compared with the one-replica plan, the one- + two-replica plan generates only 53%
additional storage space for storing all the pulsar searching data files (i.e., the data
redundancy level is 1.53), and the data reliability requirement of all the data files is
guaranteed without jeopardy. For other Cloud applications with different data compo-
sition and data reliability requirements, the data redundancy varies, and could be even
lower than 1.53.
In Figure 6.6 we only discuss the case of processing data files for 8 minutes of ob-
servation by the pulsar searching application. However, regarding the case presented
in Section 3.1 that processes the data files for the observation of 8 hours a day, 30 days
per month, assuming that Amazon S3 standard storage uses the three-replica strategy,
the storage cost per month is reduced from US$29,900 to US$15,300. Meanwhile,
the running cost of PRCR for managing such data amount is only tens of dollars per
month. It can be seen that the storage cost saved by using PRCR could be huge. More-
over, here we only compared PRCR with the conventional three-replica strategy. To
manage data files with very high data reliability requirement by using the conventional
strategy, even three replicas may not be enough. According to the nature of PRCR that
stores no more than two replicas, the storage cost reduced by using PRCR could be
even more.
6.5.3 Summary of the evaluation
We evaluated PRCR from aspects of performance and cost-effectiveness. As for per-
formance, we tested the major procedures of the PRCR working process, including
the minimum replication algorithm, the metadata scanning process, and the proactive
replica checking process. Specifically, the evaluation of minimum replication algo-
rithm is also for evaluating the minimum replication calculation approach presented
in Chapter 5 . We conclude that PRCR is able to provide data reliability management
with a wide range of data reliability requirements at a high performance. With regard
to the cost-effectiveness, we have found that the maximum capacity of PRCR suf-
fices to provide data reliability management for big data in the Cloud with a huge
number of Cloud data files with a very low running overhead. According to the data
reliability management simulation conducted, PRCR is able to minimize the storage
cost without violating the data reliability requirements. Compared to the storage using
the conventional three-replica strategy, our PRCR can reduce between two-thirds and
one-third of the storage cost, while the running overhead for PRCR itself is negligible.
6.6 Summary
In this chapter, we presented our data reliability assurance solution for big data in the
Cloud with a huge amount of data files during the data maintenance stage, which is
a novel cost-effective data reliability assurance mechanism named PRCR (Proactive
 
Search WWH ::




Custom Search