Information Technology Reference
In-Depth Information
Figure 4. Reliability With and Without Replication
acceptance level of the failures. Accordingly, the
nodes are judged as healthy or sick nodes. For the
sick nodes, information from the cluster database
is used to check if any allocations for the job has
been made on them. if it is then the allocated
modules are replicated on the healthy nodes. The
reallocation is done based on a random selection
of nodes out of all the healthy nodes. This results
in duplicate copies of the modules on more than
one node. In case of failure of any sick node, the
duplicate copies of the modules allocated to that
node can be found on the other healthy nodes
for the continuation of the job execution. This
operation results in allocation of modules on the
nodes as per the original schedule and as well
the duplicate copies of the modules lying on the
failure prone nodes. Now if no failure occurs the
job gets executed as planned but if node failures
are detected, the system does not succumb to
these failures rather it gracefully recovers with
some additional computational cost. The model
doesn't replicate all the modules of all the nodes
rather only the modules on susceptible nodes.
Thus, saving the overall cost of execution which
would have been there with, full replication.
The RBS can therefore be used along with any
scheduler available with the grid middleware as a
co-scheduler to increase the fault tolerance. The
inclusion of RBS enables the grid to respond gra-
ciously to the node failures with a little increase in
cost and a little compromise in the performance of
the grid. This is unavoidable since the replicated
modules have an altered sequence of execution
as compared with the original schedule.
Use of such a co-scheduler is an added advan-
tage for the grid system as without this the job
needs to be scheduled afresh upon encountering
failures. This results in consumption and wastage
of computational energy which may prove very
costly for the high traffic environment like grid.
For the real time jobs the problem becomes much
more severe as the failures may impact the grid
performance and thus hitting the financial pros-
pects of the grid. The use of RBS does not affect
the objective of the main scheduler allocating
the job. Instead it helps it by providing necessary
support towards failures. Experimental study
Search WWH ::




Custom Search