Fault-Tolerant Algorithms for the Concurrent-Push Architecture - Scalable Continuous Media Streaming Systems

Information Technology Reference

In-Depth Information

to avoid false alarms. We present below an admission-scheduler-based (ASB) protocol for

detecting server failures.

In our previous investigations, we found that incoming control requests could be delayed

for a substantial amount of time (e.g., more than one second) due to intense I/O activities at

the servers. Consequently, it would be more difficult to implement server-based fault-detection

protocols that can quickly detect a failure. This motivates us to implement fault-detection

at the admission scheduler rather than at the servers. The admission scheduler is originally

introduced to tackle the uneven group assignment problem arising from server clock jitters (cf.

Section 10.4). For fault detection, we extend the admission scheduler to simulate a video client.

Unlike real video clients, however, received video data are simply discarded at the admission

scheduler after bookkeeping is done, and the scheduler never performs any interactive control

nor will the stream ever terminate (until system shutdown). At the servers, video data destined

to the admission scheduler are not retrieved from the disks, but rather generated on-the-fly.

Since the generated video data will not be interpreted at the admission scheduler, the server can

avoid disk overhead by sending the same buffer repeatedly after updating header information

such as stream offset or sequence number.

When a server fails, it simply stops transmitting data. Hence, a server failure can be inferred

by the lack of video data received at the admission scheduler. We assume that the admission

scheduler is located close to the servers so that worst-case arrival deadlines are known for each

and every video packet. Then the admission scheduler can declare a server to have failed if the

arrival deadline is exceeded by a threshold of say, T ASB seconds. This threshold is introduced to

reduce the possibility of false alarms caused by unexpected data delivery delays or occasional

packet losses.

Note that the admission scheduler itself could also fail. However, this type of failure will

be less problematic because (a) while new streams cannot be started, the failure will not affect

existing streams; and (b) compared to the video servers, the admission scheduler is much

simpler and hence potentially far more reliable. For example, the admission scheduler can be

diskless so that disk failure can be avoided. ECC memory can be used to protect from memory

faults, etc.

11.3.2 Server Reconfiguration for Block Striping

Upon declaring that a server has failed, the admission scheduler will send messages to

the surviving servers to notify them of the failure. The delay incurred will obviously be

implementation-dependent. For simplicity, we assume that the failure-detection delay is

bounded and the maximum is given by T D seconds. Upon receiving the failure notification,

the servers will initiate a reconfiguration process to begin transmitting redundant blocks and

to retransmit the necessary redundant blocks.

Figure 11.4 depicts the scenario for reconfiguring a 5-server system under block striping.

Note that we consider only one video stream for illustration and analysis while in practice the

same process occurs for all active video streams. All algorithms and procedures still apply and

no modification is needed for the multi-stream case. Note also that redundant video blocks are

always retrieved, just not transmitted when there is no failure. One might notice that during

normal operation, some disk bandwidth would then be wasted in retrieving redundant blocks

that are not needed. It is conceivable that one can reuse this wasted bandwidth to serve extra

video sessions during normal operation. However, these sessions will have to be disconnected

Scalable Continuous Media Streaming Systems

Search WWH ::

Custom Search

Home