A Replica Based Co-Scheduler (RBS) for Fault Tolerant Computational Grid - Cloud, Grid and High Performance Computing: Emerging Applications

Information Technology Reference

In-Depth Information

Chapter 7

A Replica Based Co-Scheduler

(RBS) for Fault Tolerant

Computational Grid

Zahid Raza

Jawaharlal Nehru University, India

Deo Prakash Vidyarthi

Jawaharlal Nehru University, India

ABSTRACT

Grid is a parallel and distributed computing network system comprising of heterogeneous computing

resources spread over multiple administrative domains that offers high throughput computing. Since

the Grid operates at a large scale, there is always a possibility of failure ranging from hardware to

software. The penalty paid of these failures may be on a very large scale. System needs to be tolerant

to various possible failures which, in spite of many precautions, are bound to happen. Replication is a

strategy often used to introduce fault tolerance in the system to ensure successful execution of the job,

even when some of the computational resources fail. Though replication incurs a heavy cost, a selective

degree of replication can offer a good compromise between the performance and the cost. This chapter

proposes a co-scheduler that can be integrated with main scheduler for the execution of the jobs submit-

ted to computational Grid. The main scheduler may have any performance optimization criteria; the

integration of co-scheduler will be an added advantage towards fault tolerance. The chapter evaluates

the performance of the co-scheduler with the main scheduler designed to minimize the turnaround time

of a modular job by introducing module replication to counter the effects of node failures in a Grid.

Simulation study reveals that the model works well under various conditions resulting in a graceful

degradation of the scheduler's performance with improving the overall reliability offered to the job.

Search WWH ::

Custom Search

Home