Cost-effective data reliability assurance for data maintenance - Reliability Assurance of Big Data in the Cloud

Database Reference

In-Depth Information

if one replica cannot satisfy the data reliability and storage duration requirements of the

data file, the user interface requests creating a second replica by calling Cloud service (see

Chapter 7 ), and calculates the checking interval values of the data file.

2. According to the checking interval values of the data file, its metadata are distributed to the

replica management module of the corresponding PRCR node. Otherwise, when one replica

is sufficient to meet the data reliability requirement, only the original replica is stored and

the metadata of the data file need not be created.

3. Metadata attributes of the data file are stored in the data table.

4. Metadata are scanned once in each scan cycle of the PRCR node. When the metadata are

scanned, PRCR determines whether proactive replica checking is needed according to the

time stamp and checking interval of the data file.

5. If proactive replica checking is needed, the replica management module obtains the meta-

data of the data file from the data table.

6. The replica management module assigns the proactive replica checking task to one of the un-

occupied Cloud compute instances that is created in advance. The Cloud compute instance

executes the task, in which both replicas of the data file are checked.

7. The Cloud compute instance conducts further action according to the result of the proactive

replica checking task: If both replicas are alive (or lost. which is very rare, but yet within

the data reliability assurance range in overall terms), go to step 8; if only one replica is lost,

the data recovery process needs to be initiated, where the compute instance calls the Cloud

service (see Chapter 7 ) to generate a new replica based on the replica that is alive.

8. The Cloud compute instance returns the result of the proactive replica checking task. If both

replicas are not lost (or recovered from losing one replica), the time stamp, checking inter-

val, and the new replica address (if applicable) of the data file are updated in the data table.

Otherwise, a data loss alert will be issued.

Note : Steps 4 to 8 form a continuous loop until the expected storage duration is

reached or the data file is deleted. If the expected storage duration is reached, either

the storage user could renew the PRCR service or PRCR could delete the metadata of

the data file and stop the proactive replica checking process.

6.4

Optimization algorithms in PRCR

In Sections 6.2 and 6.3 , we presented the high-level design of PRCR and its working

process in detail. During the working process, additional algorithms are required so

that all the data files could be maintained properly. In this section, we present two

algorithms for supporting the data reliability assurance and optimizing the utilization

of PRCR resources. First, we present the minimum replication algorithm for determin-

ing the minimum number of replicas. Second, we present the metadata distribution

algorithm for maximizing the utilization of the PRCR capacity. Both algorithms work

within the user interface of PRCR.

6.4.1 Minimum replication algorithm

Based on the minimum replication approach presented in Chapter 5 , the minimum

replication algorithm is proposed. Based on this algorithm, minimum replicas are

Search WWH ::

Custom Search

Home