Information Technology Reference
In-Depth Information
Chapter 7
A Replica Based Co-Scheduler
(RBS) for Fault Tolerant
Computational Grid
Zahid Raza
Jawaharlal Nehru University, India
Deo Prakash Vidyarthi
Jawaharlal Nehru University, India
ABSTRACT
Grid is a parallel and distributed computing network system comprising of heterogeneous computing
resources spread over multiple administrative domains that offers high throughput computing. Since
the Grid operates at a large scale, there is always a possibility of failure ranging from hardware to
software. The penalty paid of these failures may be on a very large scale. System needs to be tolerant
to various possible failures which, in spite of many precautions, are bound to happen. Replication is a
strategy often used to introduce fault tolerance in the system to ensure successful execution of the job,
even when some of the computational resources fail. Though replication incurs a heavy cost, a selective
degree of replication can offer a good compromise between the performance and the cost. This chapter
proposes a co-scheduler that can be integrated with main scheduler for the execution of the jobs submit-
ted to computational Grid. The main scheduler may have any performance optimization criteria; the
integration of co-scheduler will be an added advantage towards fault tolerance. The chapter evaluates
the performance of the co-scheduler with the main scheduler designed to minimize the turnaround time
of a modular job by introducing module replication to counter the effects of node failures in a Grid.
Simulation study reveals that the model works well under various conditions resulting in a graceful
degradation of the scheduler's performance with improving the overall reliability offered to the job.
Search WWH ::




Custom Search