FEC versus PRT - Scalable Continuous Media Streaming Systems

Information Technology Reference

In-Depth Information

13

FEC versus PRT

With data and capacity redundancy, a parallel-server streaming system can sustain server-

level failures using either the Forward Erasure Correction (FEC) protocol or the Progressive

Redundancy Transmission (PRT) protocol. Except for the need for failure detection, PRT

is superior to FEC as it consumes significantly less bandwidth overhead for redundant data

transmission. Nevertheless PRT may also reduce the reliability of the system if multiple

servers fail within a short time. This chapter investigates this issue, andmore generally, com-

pares the reliability of FEC and PRT under the same conditions so that fair and meaningful

comparisons can be made. Surprisingly, we discover that by allowing a small trade-off in

storage overhead, PRT not only can maintain similar or even better system-level reliability,

but also reduces the bandwidth overhead in sending the redundant data by more than 50%.

13.1 Introduction

One challenge inherent in all parallel server architectures is fault tolerance. In particular, server

failure, while uncommon, can cripple the entire system if redundancies are not incorporated.

To tackle this problem, we can employ erasure correction code to enable the client to recover

data lost in failed servers (cf. Chapter 11). If the recovery is done in real-time, then the process

can even be made transparent to the end user - non-stop streaming, which is highly desirable

from a service-provisioning point of view.

Note that to enable the client to perform erasure correction computations, the servers need to

send the redundant data units in addition to the normal data units to the clients. We introduced

two such redundant data transmission protocols - Forward Erasure Correction (FEC) and

Progressive Redundancy Transmission (PRT) in Chapter 9 and subsequently applied them to

the concurrent-push architecture in Chapter 11.

Qualitatively, the PRT protocol is more complex as it requires the detection of server failure

and the dynamic reconfiguration of the system to transmit more redundant data. Moreover,

as fewer redundant data are transmitted, one would expect PRT to be less reliable than FEC.

In this chapter we investigate this reliability issue quantitatively by modeling the system as a

continuous-time Markov chain to derive its mean-time-to-failure (MTTF), incorporating the

effects of server failure rate, server repair rate, failure detection and system reconfiguration

Search WWH ::

Custom Search

Home