Database Reference
In-Depth Information
In Section 4.2 , we illustrated that the data reliability of a single replica with a
variable disk failure rate follows exponential distribution, and hence the memoryless
property still holds. As the data reliability of each replica is independent, the mem-
oryless property should also hold to our generic data reliability model for multiple
replicas. According to this property, the data reliability for any period from any given
moment can be calculated. More importantly, according to our generic data reliabil-
ity model, shorter storage duration results in lower probability of data loss. Thus the
basic idea of managing data reliability based on proactive replica checking can be
formed: While a data file is stored in the Cloud, each replica of the data file is
checked periodically. The loss of replicas can be discovered and then recovered
within each allowed period, and this process is repeated during the storage. By
changing the duration of such a period as well as the frequency of proactive
replica checking, a range of data reliability assurances can be provided. Based
on this idea, the PRCR mechanism can be proposed.
By using PRCR, Cloud data files can be managed in different styles according to
their expected storage duration and reliability requirements: For data files that are
only for short-term storage and/or require the data reliability that a single replica can
offer, one replica is sufficient for the data file; for data files that are for long-term
use and/or have a data reliability requirement higher than the reliability assurance
of a single replica, two replicas are stored while being periodically and proactively
checked. During the proactive replica checking, replicas of the data files are ac-
cessed to check their existence. 1 The proactive replica-checking tasks are always
conducted before the reliability assurance drops below the reliability requirement.
Any single replica loss can be recovered in time when found, so that the reliability
of the data files can be ensured.
In some extreme cases, both replicas may be lost at the same time or within a small
time window (i.e., between two successive proactive checking tasks for the data file).
The probability of such a situation is already incorporated in the data reliability model.
Given a certain data reliability requirement, PRCR is responsible for maintaining the
data loss probability within the agreed range. For example, given the data reliability
requirement of 99.99% per year, PRCR ensures that the data loss rate is no bigger
than 0.01% per year for all the data files, and hence the loss of both replicas does not
jeopardize the reliability assurance in overall terms.
6.2
Overview of PRCR
PRCR is a data reliability assurance/replica management mechanism designed for
managing big data in the Cloud with a huge number of Cloud data files. It is normally
conducted as a data reliability management service provided by the Cloud storage
providers. By using PRCR, Cloud data files can be stored with minimum replication
while meeting the data reliability requirement.
1 As the proactive replica checking is conducted within the same Cloud provider, we believe that the insta-
bility of the network is minimized. Therefore, the replica is considered to be lost when it cannot be accessed.
 
Search WWH ::




Custom Search