Algorithms for Server Rebuild - Scalable Continuous Media Streaming Systems

Information Technology Reference

In-Depth Information

14

Algorithms for Server Rebuild

In the previous chapterswe have introduced themany desirable features of parallel streaming

servers such as scalability and fault tolerance. In this and the next chapter we will address

two practical issues resulting from the use of striped server storage. First, in this chapter

we investigate the issue of server data rebuild and in Chapter 15 the issues in expanding a

parallel server system.

Armedwith redundant data and streaming capacity, a parallel streaming server can sustain

server-level failures and maintain non-stop media playback. However, the failed server will

eventually need to be repaired, or if repair is not feasible or desirable, replaced by a new

server. In the latter case we will need to load the appropriate media data into the new server

so that it can share the streaming workload - the server data rebuild problem. This chapter

investigates this and analyzes algorithms for rebuilding data in a failed server into a new

server transparently so that existing streaming sessions are not adversely affected.

14.1 Introduction

Armed with data and capacity redundancy, a parallel streaming server can operate in degraded

mode without causing any interruption to the existing streaming sessions. However, the failed

server will still need to be repaired or replaced and in the latter case we will also need to load

appropriate media data into the new server so that it can share some of the streaming workload.

This is referred to as the server data rebuild problem.

In the context of disk arrays and RAID [1], a similar data rebuild problem also exists, and

in Chapter 5 we have investigated rebuild algorithms for disk arrays. Despite the similarities,

parallel server differs from RAID in that bandwidth of the communications links is far more

limited. For example, when considering RAID in Chapter 5 we assumed that the data bus

connecting the hard disks and the main system was not the bottleneck and so the disks could

retrieve and stored data into the system memory buffers as fast as the disk would allow. While

this is a reasonable assumption in RAID, in parallel server the network linking up the servers

will likely have more limited bandwidth, e.g., 1 Gbp using Gigabit Ethernet [2] switches. Thus,

the network may become the bottleneck in the data rebuild process.

Search WWH ::

Custom Search

Home