Taxonomy and Architectural Alternatives - Scalable Continuous Media Streaming Systems

Information Technology Reference

In-Depth Information

9.4.3 Detecting and Masking Server Failures

In Section 9.3.4 we discussed how data redundancy can be introduced among the servers to

support server-level fault tolerance. The redundant data enable the receiver to mask a server

failure by computing lost stripe units stored in the failed server from the parity units together

with the remaining stripe units. Using concepts similar to Forward Error Correction (FEC),

the servers can send redundant data along with normal data to the receiver at all times. In this

way, the receiver can recover lost packets using the received data together with the redundant

data. This approach can be extended to parallel video servers to recover stripe units lost in

failed servers.

FEC has the distinct advantage that the receiver does not need to detect a server fail-

ure. As redundant data are always transmitted and received by the receiver, lost stripe units

can readily be recovered if a server fails. However, like the case in network communica-

tions, FEC incurs constant transmission overhead even when no server fails. According to

coding theory, we need one redundant symbol for every lost symbol we want to recover.

Therefore, if we use K to denote the number of lost symbols we want to recover per par-

ity group (or stripe), the transmission overhead will be K /( N S −

K ). Note that this overhead

could become significant for systems having a small number of servers, or a high level of

redundancies.

Alternatively, for redundancy levels larger than one (i.e., K

1), we can adopt a Progressive

Redundancy Transmission (PRT) algorithm to reduce the transmission overhead. Specifically,

PRT initially does not transmit all available redundant data but only a portion of them. When

a server failure is detected, PRT dynamically requests the servers to begin transmitting an

additional redundant unit per stripe. For example, let K

>

3, then the system can be configured

to initially transmit only one redundant unit. When a server fails, the systemwill be able tomask

the failure immediately using the available redundant unit. At the same time the remaining

servers will begin transmitting one more redundant unit per stripe to prepare for a second server

failure, and so on. As servers seldom fail simultaneously (unless hit by natural disasters), this

PRT algorithm can keep the transmission overhead low while still allowing the system to

survive multiple server failures.

The challenge in PRT is to devise a way to detect server failures quickly and reliably.

The detection method must be quick enough to ensure that video playback continuity can

be sustained while the system request redundant data for recovery. On the other hand, the

detection method must not generate too many false alarms to avoid sending unnecessary many

redundant data to the clients. In Chapter 14 we take a closer look of FEC and PRT by modeling

their availability to quantify the tradeoffs.

=

9.5 Summary

In this chapter, we have introduced a framework for the design of parallel video server archi-

tectures. We presented design alternatives, and reviewed existing literatures on three central

architectural issues, namely video distribution architectures, server striping policies, and video

delivery protocols. Table 9.1 summarizes the design choices adopted in some of the previous

studies. In the next three chapters, we investigate in detail two specific parallel architectures -

the concurrent-push architecture and the staggered-push architecture.

Scalable Continuous Media Streaming Systems

Search WWH ::

Custom Search

Home