Database Reference
In-Depth Information
Therefore, each data file should be managed by the PRCR node with a scan cycle of
the proper length. The scan cycle constraint of the PRCR node could lead to certain
underutilization of PRCR.
To maximize the utilization of PRCR while providing sufficient data reliability as-
surance to the data files, according to the checking interval values of the data files and
the scan cycles of PRCR nodes, the metadata distribution algorithm distributes the
metadata of each data file to the most appropriate PRCR node. The principle of the
algorithm is simple: It compares the checking interval values of the data file with the
scan cycle of each PRCR node. Among the PRCR nodes with a scan cycle smaller than
the checking interval values of the data file, the metadata are distributed to the node (or
a random one of several nodes) that has the biggest scan cycle. The difference between
the scan cycle of a PRCR node and the checking interval of the data file indicates the
length of time for which the proactive replica checking task is conducted before the
checking interval is reached. When this difference is minimized, the metadata scanning
and proactive replica checking tasks can be least frequently conducted to each data file,
so that the number of data files that a PRCR node is able to manage can be maximized.
The following presents the proof of the effectiveness of the metadata distribution
algorithm:
Theorem . Given multiple PRCR nodes with different scan cycles, the distribution of meta-
data following the metadata distribution algorithm maximizes the utilization of all the PRCR
nodes.
Proof . Assume that all PRCR nodes reach the maximum capacity while all the meta-
data are distributed by following the metadata distribution algorithm. Therefore, for
any data ile f maintained by PRCR node A and any other PRCR node I with scan
cycle bigger than A, let CI () be the minimum checking interval of data ile f , we
have
() () () . Without losing generality, we randomly create
another metadata distribution other than the current one by swapping the metadata of a pair
of data iles. Assume two PRCR nodes B and C, in which
ScanCycleA CI f canCycleI
<
() ( .
Assume that data iles f 1 and f 2 be managed by PRCR node B and PRCR node C respec-
tively. Swap their managing PRCR nodes. Since
ScanCycleB ScanCycleC
>
() ( 2 , the data reliability
requirement of f 2 cannot be met. Therefore, data ile f 2 cannot be managed by PRCR by fol-
lowing the new metadata distribution. Therefore, the utilization of PRCR nodes by following
this new distribution is lower than that by following the metadata distribution algorithm.
According to the preceding reasoning, it can be deduced that there is no other metadata
distribution that has higher utilization. Hence, the theorem holds.
CI f canCycleB
<
Figure 6.4 shows the pseudo code of the metadata distribution algorithm. In the
figure, CI indicates the minimum checking interval of the data file. S indicates the
set of all the PRCR nodes. The algorithm first calculates the differences between CI
and the scan cycles of all available PRCR nodes (lines 2-3). Then, from all the PRCR
nodes with a scan cycle smaller than CI , the ones with the smallest difference val-
ues are selected as the candidates of the destination node (lines 4-6). Finally, one of
the candidates is randomly chosen as the destination node (line 7). The reason for
randomly choosing one node from the node set is to deal with the situation where
multiple PRCR nodes have the same scan cycle. The metadata distribution algorithm
is able to effectively optimize the utilization of all the PRCR nodes. However, in
Search WWH ::




Custom Search