Information Technology Reference
In-Depth Information
14
Algorithms for Server Rebuild
In the previous chapterswe have introduced themany desirable features of parallel streaming
servers such as scalability and fault tolerance. In this and the next chapter we will address
two practical issues resulting from the use of striped server storage. First, in this chapter
we investigate the issue of server data rebuild and in Chapter 15 the issues in expanding a
parallel server system.
Armedwith redundant data and streaming capacity, a parallel streaming server can sustain
server-level failures and maintain non-stop media playback. However, the failed server will
eventually need to be repaired, or if repair is not feasible or desirable, replaced by a new
server. In the latter case we will need to load the appropriate media data into the new server
so that it can share the streaming workload - the server data rebuild problem. This chapter
investigates this and analyzes algorithms for rebuilding data in a failed server into a new
server transparently so that existing streaming sessions are not adversely affected.
14.1 Introduction
Armed with data and capacity redundancy, a parallel streaming server can operate in degraded
mode without causing any interruption to the existing streaming sessions. However, the failed
server will still need to be repaired or replaced and in the latter case we will also need to load
appropriate media data into the new server so that it can share some of the streaming workload.
This is referred to as the server data rebuild problem.
In the context of disk arrays and RAID [1], a similar data rebuild problem also exists, and
in Chapter 5 we have investigated rebuild algorithms for disk arrays. Despite the similarities,
parallel server differs from RAID in that bandwidth of the communications links is far more
limited. For example, when considering RAID in Chapter 5 we assumed that the data bus
connecting the hard disks and the main system was not the bottleneck and so the disks could
retrieve and stored data into the system memory buffers as fast as the disk would allow. While
this is a reasonable assumption in RAID, in parallel server the network linking up the servers
will likely have more limited bandwidth, e.g., 1 Gbp using Gigabit Ethernet [2] switches. Thus,
the network may become the bottleneck in the data rebuild process.
Search WWH ::




Custom Search