Information Technology Reference
In-Depth Information
13
FEC versus PRT
With data and capacity redundancy, a parallel-server streaming system can sustain server-
level failures using either the Forward Erasure Correction (FEC) protocol or the Progressive
Redundancy Transmission (PRT) protocol. Except for the need for failure detection, PRT
is superior to FEC as it consumes significantly less bandwidth overhead for redundant data
transmission. Nevertheless PRT may also reduce the reliability of the system if multiple
servers fail within a short time. This chapter investigates this issue, andmore generally, com-
pares the reliability of FEC and PRT under the same conditions so that fair and meaningful
comparisons can be made. Surprisingly, we discover that by allowing a small trade-off in
storage overhead, PRT not only can maintain similar or even better system-level reliability,
but also reduces the bandwidth overhead in sending the redundant data by more than 50%.
13.1 Introduction
One challenge inherent in all parallel server architectures is fault tolerance. In particular, server
failure, while uncommon, can cripple the entire system if redundancies are not incorporated.
To tackle this problem, we can employ erasure correction code to enable the client to recover
data lost in failed servers (cf. Chapter 11). If the recovery is done in real-time, then the process
can even be made transparent to the end user - non-stop streaming, which is highly desirable
from a service-provisioning point of view.
Note that to enable the client to perform erasure correction computations, the servers need to
send the redundant data units in addition to the normal data units to the clients. We introduced
two such redundant data transmission protocols - Forward Erasure Correction (FEC) and
Progressive Redundancy Transmission (PRT) in Chapter 9 and subsequently applied them to
the concurrent-push architecture in Chapter 11.
Qualitatively, the PRT protocol is more complex as it requires the detection of server failure
and the dynamic reconfiguration of the system to transmit more redundant data. Moreover,
as fewer redundant data are transmitted, one would expect PRT to be less reliable than FEC.
In this chapter we investigate this reliability issue quantitatively by modeling the system as a
continuous-time Markov chain to derive its mean-time-to-failure (MTTF), incorporating the
effects of server failure rate, server repair rate, failure detection and system reconfiguration
Search WWH ::




Custom Search