Information Technology Reference
In-Depth Information
15.4.4 Reuse of Cached Data Blocks
In addition to reusing old redundant data, we can also reduce data block movements by caching
and reusing data blocks already received by the redundant node. Reconsider the previous
example of computing c 1 , 1
(cf. equation (15.11)), the data blocks needed are
v 4 ,
v 8 , and
v 9 .
v 4 has already been sent to the redundant node when computing c 0 , 1 (cf. equation
(15.7)) and thus can simply be reused if it is cached. As a redundant block is computed from a
parity group of ( N
However,
h ) data blocks, we only need to cache the ( N
h ) most recently received
data blocks.
15.4.5 Redundant Data Update Overhead
In this section we evaluate performance of the SRDU algorithm using simulation. Beginning
with a small system, we add new nodes to the system and then apply the SRDU algorithm to
update the redundant data blocks. Performance is measured by the number of data blocks that
need to be sent to the redundant nodes for redundant data update - or simply called redundant
data update overhead. The total number of data blocks is 40,000 and is fixed throughout the
simulation. For simplicity the redundant data update overhead for updating one redundant
node is presented. We will return to the issue of updating multiple redundant data nodes in
Section 15.5.
In the first experiment, we investigate the redundant data update overhead in continuous
system growth. We begin with a system of five data nodes and one redundant node. Then we
add a new node to the system one by one, each time the redundant data blocks are completely
updated using the SRDU algorithm. This continues until the system grows to 400 data nodes.
Figure 15.13 plots the redundant data update overhead versus data node size from 6 to 400.
As expected, the simple algorithm (sending all data blocks to the redundant node) performs
the worst. On the other hand, regenerating redundant data using a centralized archive server
incurs the least overhead. These two curves serve respectively as the upper bound and lower
bound for redundant data update overhead.
Surprisingly, direct reuse of the original redundant data does not result in significant sav-
ings. This is because the algorithm maintains the same data order within the parity group
in computing the redundant data, and thus significantly reduces the opportunity for redun-
dant data reuse. Once this restriction is relaxed by reshuffling the parity group, the redun-
dant data update overhead is reduced by half to around 20,000 blocks. Reuse of cached data
blocks further reduces the overhead by half to around 10,000 blocks. With all three techniques
applied, the SRDU algorithm can reduce the redundant data update overhead by as much
as 70%.
In the previous experiment, we completely update all redundant data blocks before adding
another new node. Clearly this is inefficient if new nodes are added frequently or added to
the system in a batch. To address this issue, we conduct a second experiment where multiple
nodes, say W , are added to the system before the redundant data are updated. Figure 15.14
plots the redundant data update overhead versus the batch size W for initial system size of
80 nodes. The results show that we can reduce the per-node redundant data update overhead
significantly by batching redundant data update for multiple new nodes.
Search WWH ::




Custom Search