Information Technology Reference
In-Depth Information
elaborating the need and integration of RBS with
a main scheduler. Working of the model using a
suitable example is illustrated next along with the
details of the results obtained from the simulation
study. The chapter finally concludes detailing the
achievements and drawbacks of the work.
Kuppuswam & Ragupathi, 2006). Many more
similar models are also available in the literature.
THE REPLICA BASED CO-
SCHEDULER (RBS)
Replication can be applied in many ways for grid
constituents to induce fault tolerance in the system.
Depending on the requirements and availability it
could be used at hardware or the software level.
These techniques do well irrespective of the al-
location strategy used by the scheduler but with
the increased cost of execution both in terms of
computational power and money. The degree and
type of replication introduced, thus depends on
the acceptable amount of failures the system can
digest. Since grid is a heterogeneous environ-
ment, the failures may occur at many levels viz.
the job may fail during the time of submission,
the computational resource may fail while job is
being scheduled or even after being scheduled,
the network links may fail while the job is inter-
acting with the user or within itself. Among all
these failures, those accounting to failed resources
or application before scheduling does not have
a serious effect as they can be taken up again
for scheduling. The problem is serious when
the resources fail while executing the jobs. The
most disastrous failure could be the node failure
on which the job is getting executed. Robustness
towards application failure and network failures
is difficult to attain but the node failure can be
handled a bit more easily if we have the informa-
tion about the allocation of various modules (jobs)
allocated on that node.
The proposed Replica Based Co-Scheduler
(RBS) helps in the reliable execution of the modu-
lar job by replicating the modules allocated to the
nodes with high failure rates (sick nodes) to the
ones with a lower failure rates (healthy nodes).
The reallocation is done only once for a module
based on the random selection of nodes out of all
the healthy nodes. This results in having duplicate
RELATED MODELS
The grid being an aggregation of geographically
distributed heterogeneous resources; the degree
of unreliable behavior extends from the compu-
tational resources to the applications running to
the network media. A reliable and fault tolerant
scheduling has gained enough attention from the
researchers and many models have been reported
in the literature addressing these issues. A few
models have been proposed to counter the effect
of these failures by adopting proactive to reactive
solutions. A reliability analysis of grid computing
systems has been done in (Dai, Xie, & Poh, 2002).
An agent oriented fault tolerant framework has
been proposed in (Huda, Schmidt & Peake, 2005)
to use agents to monitor the system and in case
of any threat appropriate measures may be taken
beforehand to prevent failures. A checkpoint-based
mechanism has been adopted for recovery from
failures from the last saved state as a reactive mea-
sure (Mujumdar, Bheevgade, Malik & Patrikar,
2008). Introduction of redundancy is a popular
means to safeguard the application, as reported
in many models in the literature. A study of the
tradeoff between performance and availability
has been carried out suggesting a file replication
strategy (Zhang & Honeyman, 2008). The use of
replication by determining the number of replicas
required and then suggesting a scheduling strat-
egy for the tasks submitted is reported in (Li &
Mascagni, 2003). Another fault tolerant strategy
using replication is proposed in (Liu, Wu, Ma, &
Cai, 2008) whereas a model using database centric
approach for static workload for data grid has been
proposed in (Desprez & Vernois, 2007; Sathya,
Search WWH ::




Custom Search