Fault-Tolerant Algorithms for the Concurrent-Push Architecture - Scalable Continuous Media Streaming Systems - page 195

Information Technology Reference

In-Depth Information

precomputed and distributed to the servers in a round-robin manner similar to a RAID-5 disk

array. Note that a parity group spans all servers and hence the parity group size equals the

number of servers in the system.

In Chapter 9, we introduced two approaches for transmitting redundant data to the clients,

namely, forward erasure correction and progressive redundancy transmission. These two

schemes represent different tradeoffs: FEC simplifies system implementation and has lower

client buffer requirement and start-up delay in certain cases, at the expense of network band-

width overhead during normal operation (i.e., no failure); PRT reduces this bandwidth overhead

at the expense of more complicated system implementation and potentially larger buffer re-

quirement and start-up delay. We present a FEC-based transmission scheme for concurrent

push in the next section, and a PRT-based transmission scheme in Section 11.3.

11.2 Forward Erasure Correction (FEC)

As the name suggests, servers under FEC transmit redundant data regardless of server failure.

As redundant data are always received, the client can re-compute unavailable data by erasure

correction computation (see Figure 11.3 for the case under sub-schedule striping). Hence,

one does not need to detect server failure for the sake of maintaining non-stop operation, and

consequently system reconfiguration is also unnecessary. Clearly, this can greatly simplify the

implementation and avoid other complications such as false alarm or undetected failure. The

tradeoff is extra network bandwidth required to deliver redundant data during normal-mode

operation. Specifically, with N S servers and a redundancy level of K (i.e., up to K simultaneous

server failures can be sustained), the network bandwidth overhead incurred will be given by

K

N S −

H FEC =

(11.1)

K

For a small-scale system (i.e., N S small) with high level of redundancy (i.e., K large), this

overhead could become prohibitive. For example, with N S =

1, the overhead would

become 50%. Considering that a VoD system is expected to operate mostly in normal mode,

3 and K

=

...

S 0

. . .

5 0

5

0

...

S 1

. . .

6 1

6

1

...

S 2

. . .

7 2

7

2

. . .

P 3

S 3 failed

...

S 4

. . .

4 P

4

P

3

No recovery needed

Stripe unit 3 recovered

Figure 11.3 Recovery of unavailable stripe units through erasure correction code

Next Page

Scalable Continuous Media Streaming Systems

Search WWH ::

Custom Search

Home