Fault-Tolerant Algorithms for the Concurrent-Push Architecture - Scalable Continuous Media Streaming Systems

Information Technology Reference

In-Depth Information

11

Fault-Tolerant Algorithms for the

Concurrent-Push Architecture

One potential problem with the concurrent-push architecture, and any parallel server archi-

tectures in general, is reliability. As the system distributes video data over multiple servers,

failure of a single server will cripple the entire system. Worst still, as the system is scaled

up to more users, more servers will be needed and consequently the system-wide reliabil-

ity will decrease accordingly. Drawing similar principles from disk array researches, we

present in this chapter fault-tolerant algorithms to improve the system reliability.

In particular, we address three key problems pertaining to supporting fault tolerance in

the concurrent-push architecture, namely, redundancy management, redundant data trans-

mission protocol, and real-time fault masking. First, redundant data based on erasure codes

are introduced to video data stored in the servers, which are then delivered to the clients

to support fault tolerance. Despite the success of distributed redundancy striping schemes

such as RAID-5 in disk array implementations, we discover that similar schemes extended

to the server context do not scale well. Instead, we develop a redundant server scheme that

is both scalable and consumes less server buffer. Second, two protocols are introduced to

control the transmission of redundant data to the clients, namely, forward erasure correction

(FEC) and progressive redundancy transmission (PRT). These two protocols achieve differ-

ent tradeoffs between bandwidth overhead, implementation complexity, and client buffer

requirement. Finally, we derive the amount of client buffers required so that non-stop,

continuous video playback can be maintained during server failure.

11.1 Redundancy Management

To support server-level fault tolerance, we need redundant data so that a client can re-compute

the unavailable video data after server failures. The problem of correcting data errors has been

studied extensively in the literature. According to coding theory [1], one can encode a set

of symbols with redundancies so that errors occurring within the set can be corrected later.

However, server failure is slightly different in the sense that there is really no error in the

coding sense. Instead, a server failure introduces erasures - the absence of data.

Search WWH ::

Custom Search

Home