Information Technology Reference
In-Depth Information
11
Fault-Tolerant Algorithms for the
Concurrent-Push Architecture
One potential problem with the concurrent-push architecture, and any parallel server archi-
tectures in general, is reliability. As the system distributes video data over multiple servers,
failure of a single server will cripple the entire system. Worst still, as the system is scaled
up to more users, more servers will be needed and consequently the system-wide reliabil-
ity will decrease accordingly. Drawing similar principles from disk array researches, we
present in this chapter fault-tolerant algorithms to improve the system reliability.
In particular, we address three key problems pertaining to supporting fault tolerance in
the concurrent-push architecture, namely, redundancy management, redundant data trans-
mission protocol, and real-time fault masking. First, redundant data based on erasure codes
are introduced to video data stored in the servers, which are then delivered to the clients
to support fault tolerance. Despite the success of distributed redundancy striping schemes
such as RAID-5 in disk array implementations, we discover that similar schemes extended
to the server context do not scale well. Instead, we develop a redundant server scheme that
is both scalable and consumes less server buffer. Second, two protocols are introduced to
control the transmission of redundant data to the clients, namely, forward erasure correction
(FEC) and progressive redundancy transmission (PRT). These two protocols achieve differ-
ent tradeoffs between bandwidth overhead, implementation complexity, and client buffer
requirement. Finally, we derive the amount of client buffers required so that non-stop,
continuous video playback can be maintained during server failure.
11.1 Redundancy Management
To support server-level fault tolerance, we need redundant data so that a client can re-compute
the unavailable video data after server failures. The problem of correcting data errors has been
studied extensively in the literature. According to coding theory [1], one can encode a set
of symbols with redundancies so that errors occurring within the set can be corrected later.
However, server failure is slightly different in the sense that there is really no error in the
coding sense. Instead, a server failure introduces erasures - the absence of data.
Search WWH ::




Custom Search