Information Technology Reference
In-Depth Information
9.4.3 Detecting and Masking Server Failures
In Section 9.3.4 we discussed how data redundancy can be introduced among the servers to
support server-level fault tolerance. The redundant data enable the receiver to mask a server
failure by computing lost stripe units stored in the failed server from the parity units together
with the remaining stripe units. Using concepts similar to Forward Error Correction (FEC),
the servers can send redundant data along with normal data to the receiver at all times. In this
way, the receiver can recover lost packets using the received data together with the redundant
data. This approach can be extended to parallel video servers to recover stripe units lost in
failed servers.
FEC has the distinct advantage that the receiver does not need to detect a server fail-
ure. As redundant data are always transmitted and received by the receiver, lost stripe units
can readily be recovered if a server fails. However, like the case in network communica-
tions, FEC incurs constant transmission overhead even when no server fails. According to
coding theory, we need one redundant symbol for every lost symbol we want to recover.
Therefore, if we use K to denote the number of lost symbols we want to recover per par-
ity group (or stripe), the transmission overhead will be K /( N S
K ). Note that this overhead
could become significant for systems having a small number of servers, or a high level of
redundancies.
Alternatively, for redundancy levels larger than one (i.e., K
1), we can adopt a Progressive
Redundancy Transmission (PRT) algorithm to reduce the transmission overhead. Specifically,
PRT initially does not transmit all available redundant data but only a portion of them. When
a server failure is detected, PRT dynamically requests the servers to begin transmitting an
additional redundant unit per stripe. For example, let K
>
3, then the system can be configured
to initially transmit only one redundant unit. When a server fails, the systemwill be able tomask
the failure immediately using the available redundant unit. At the same time the remaining
servers will begin transmitting one more redundant unit per stripe to prepare for a second server
failure, and so on. As servers seldom fail simultaneously (unless hit by natural disasters), this
PRT algorithm can keep the transmission overhead low while still allowing the system to
survive multiple server failures.
The challenge in PRT is to devise a way to detect server failures quickly and reliably.
The detection method must be quick enough to ensure that video playback continuity can
be sustained while the system request redundant data for recovery. On the other hand, the
detection method must not generate too many false alarms to avoid sending unnecessary many
redundant data to the clients. In Chapter 14 we take a closer look of FEC and PRT by modeling
their availability to quantify the tradeoffs.
=
9.5 Summary
In this chapter, we have introduced a framework for the design of parallel video server archi-
tectures. We presented design alternatives, and reviewed existing literatures on three central
architectural issues, namely video distribution architectures, server striping policies, and video
delivery protocols. Table 9.1 summarizes the design choices adopted in some of the previous
studies. In the next three chapters, we investigate in detail two specific parallel architectures -
the concurrent-push architecture and the staggered-push architecture.
Search WWH ::




Custom Search