Algorithms for System Expansion - Scalable Continuous Media Streaming Systems

Information Technology Reference

In-Depth Information

15.4.4 Reuse of Cached Data Blocks

In addition to reusing old redundant data, we can also reduce data block movements by caching

and reusing data blocks already received by the redundant node. Reconsider the previous

example of computing c 1 , 1

(cf. equation (15.11)), the data blocks needed are

v 4 ,

v 8 , and

v 9 .

v 4 has already been sent to the redundant node when computing c 0 , 1 (cf. equation

(15.7)) and thus can simply be reused if it is cached. As a redundant block is computed from a

parity group of ( N

However,

−

h ) data blocks, we only need to cache the ( N

−

h ) most recently received

data blocks.

15.4.5 Redundant Data Update Overhead

In this section we evaluate performance of the SRDU algorithm using simulation. Beginning

with a small system, we add new nodes to the system and then apply the SRDU algorithm to

update the redundant data blocks. Performance is measured by the number of data blocks that

need to be sent to the redundant nodes for redundant data update - or simply called redundant

data update overhead. The total number of data blocks is 40,000 and is fixed throughout the

simulation. For simplicity the redundant data update overhead for updating one redundant

node is presented. We will return to the issue of updating multiple redundant data nodes in

Section 15.5.

In the first experiment, we investigate the redundant data update overhead in continuous

system growth. We begin with a system of five data nodes and one redundant node. Then we

add a new node to the system one by one, each time the redundant data blocks are completely

updated using the SRDU algorithm. This continues until the system grows to 400 data nodes.

Figure 15.13 plots the redundant data update overhead versus data node size from 6 to 400.

As expected, the simple algorithm (sending all data blocks to the redundant node) performs

the worst. On the other hand, regenerating redundant data using a centralized archive server

incurs the least overhead. These two curves serve respectively as the upper bound and lower

bound for redundant data update overhead.

Surprisingly, direct reuse of the original redundant data does not result in significant sav-

ings. This is because the algorithm maintains the same data order within the parity group

in computing the redundant data, and thus significantly reduces the opportunity for redun-

dant data reuse. Once this restriction is relaxed by reshuffling the parity group, the redun-

dant data update overhead is reduced by half to around 20,000 blocks. Reuse of cached data

blocks further reduces the overhead by half to around 10,000 blocks. With all three techniques

applied, the SRDU algorithm can reduce the redundant data update overhead by as much

as 70%.

In the previous experiment, we completely update all redundant data blocks before adding

another new node. Clearly this is inefficient if new nodes are added frequently or added to

the system in a batch. To address this issue, we conduct a second experiment where multiple

nodes, say W , are added to the system before the redundant data are updated. Figure 15.14

plots the redundant data update overhead versus the batch size W for initial system size of

80 nodes. The results show that we can reduce the per-node redundant data update overhead

significantly by batching redundant data update for multiple new nodes.

Scalable Continuous Media Streaming Systems

Search WWH ::

Custom Search

Home