Database Reference
In-Depth Information
expected storage duration of the data file in one go, and returns the checking interval
values set as the result (lines 3-11).
In addition to the application of the algorithm for data storage with variable disk
failure rate, it is also applicable when the disk failure rate is a constant (e.g., virtual
disks located over the virtual layer of the Cloud could apply such reliability model). In
that case, the minimum replication algorithm is significantly simplified, as the steps
of calculating average failure rate (line 1) and obtaining piecewise functions (lines
5-6) can be omitted. The process of solving equation (6.1) only needs to be conducted
once, and the checking interval obtained does not change unless any replica of the data
file is lost and the corresponding disk is changed.
6.4.2 Metadata distribution algorithm
To manage the large amount of data files in the Cloud, PRCR must have a practically
sufficient capacity. Meanwhile, to fully use the capacity of PRCR, the utilization of
PRCR nodes must be maximized. To address this issue, we propose our metadata dis-
tribution algorithm. There are two purposes of the algorithm. First, it maximizes the
utilization of PRCR, so that the running cost of PRCR for maintaining each data file
is minimized. Second, it distributes the metadata of data files to the appropriate PRCR
nodes, so that a sufficient data reliability assurance RA (1)
k
can be provided for meet-
ing the data reliability requirement.
6.4.2.1 The maximum capacity of PRCR
The maximum capacity of PRCR stands for the maximum number of data files that
PRCR is able to manage. In PRCR, the main component for replica management is the
PRCR node. As mentioned in Section 6.2 , PRCR may contain multiple PRCR nodes.
Therefore, the maximum capacity of PRCR is the sum of the maximum capacities of
all PRCR nodes. The maximum capacity of each PRCR node is determined by two
parameters, which are the metadata scanning time and the scan cycle of the PRCR
node. Note that the metadata scanning time is the time taken for scanning the metadata
of a data file in the data table. The maximum capacity of PRCR can be presented by
equation (6.2) . In the equation, C indicates the maximum capacity of PRCR, T cycle
i is
the scan cycle of PRCR node i , T sca i is the metadata scanning time of PRCR node i and
N is the number of PRCR nodes in PRCR.
i
T
T
N
cycle
C
=
(6.2)
i
i
=
1
scan
6.4.2.2 Provision of suficient data reliability assurance
Although the maximum capacity of PRCR nodes can be calculated as just mentioned,
in order to provide sufficient data reliability assurance to the data files, the scan cycle
of the PRCR node must be no bigger than the checking interval values of data files.
 
Search WWH ::




Custom Search