A Replica Based Co-Scheduler (RBS) for Fault Tolerant Computational Grid - Cloud, Grid and High Performance Computing: Emerging Applications

Information Technology Reference

In-Depth Information

Figure 4. Reliability With and Without Replication

acceptance level of the failures. Accordingly, the

nodes are judged as healthy or sick nodes. For the

sick nodes, information from the cluster database

is used to check if any allocations for the job has

been made on them. if it is then the allocated

modules are replicated on the healthy nodes. The

reallocation is done based on a random selection

of nodes out of all the healthy nodes. This results

in duplicate copies of the modules on more than

one node. In case of failure of any sick node, the

duplicate copies of the modules allocated to that

node can be found on the other healthy nodes

for the continuation of the job execution. This

operation results in allocation of modules on the

nodes as per the original schedule and as well

the duplicate copies of the modules lying on the

failure prone nodes. Now if no failure occurs the

job gets executed as planned but if node failures

are detected, the system does not succumb to

these failures rather it gracefully recovers with

some additional computational cost. The model

doesn't replicate all the modules of all the nodes

rather only the modules on susceptible nodes.

Thus, saving the overall cost of execution which

would have been there with, full replication.

The RBS can therefore be used along with any

scheduler available with the grid middleware as a

co-scheduler to increase the fault tolerance. The

inclusion of RBS enables the grid to respond gra-

ciously to the node failures with a little increase in

cost and a little compromise in the performance of

the grid. This is unavoidable since the replicated

modules have an altered sequence of execution

as compared with the original schedule.

Use of such a co-scheduler is an added advan-

tage for the grid system as without this the job

needs to be scheduled afresh upon encountering

failures. This results in consumption and wastage

of computational energy which may prove very

costly for the high traffic environment like grid.

For the real time jobs the problem becomes much

more severe as the failures may impact the grid

performance and thus hitting the financial pros-

pects of the grid. The use of RBS does not affect

the objective of the main scheduler allocating

the job. Instead it helps it by providing necessary

support towards failures. Experimental study

Cloud, Grid and High Performance Computing: Emerging Applications

Search WWH ::

Custom Search

Home