Information Technology Reference
In-Depth Information
INTRODUCTION
challenges is to ensure a reliable environment to
the job so that it can cope with any kind of failure.
Since the grid resources are heterogeneous in be-
havior and administrative control, introduction of
fault tolerance in the system is very difficult. In
addition, the jobs demanding execution on the grid
themselves may be very complex and may take
a long time to execute making them vulnerable
to failures. Further, the resources are under the
user control so even accidental damages or even
a forced shutdown may fail the execution. Similar
is true for the network failure also. These failures
may range from hardware to software and to the
network failures. The fault tolerant techniques can
thus vary from proactive to reactive approaches
to counter failure at any level (Dai, Xie, & Poh,
2002; Huda, Schmidt & Peake, 2005; Mujumdar,
Bheevgade, Malik & Patrikar, 2008). In spite of
these measures, the chances of failures cannot be
overruled. The desired objective is to accept these
failures and minimize their effect by gracefully
degrading the system with continued job execution
at the cost of a compromised overall performance.
One of the popular mechanisms to handle failures
is to introduce replication. This could be in the
hardware form or the software form in which same
application is executed or stored at more than one
resources. Therefore, with the slight increase in
the execution cost, replication increases the prob-
ability of the successful execution of the job, thus
being fault tolerant.
Replication incurs a heavy cost but this cost can
be minimized by adopting selective replication.
The selection of nodes or job modules depends
on certain parameters that can be decided by the
system as per the scheduling requirements. The
RBS works on the basis of replicating some of the
modules allocated on a node with high failure rate
on to those nodes with lesser failure rate. There-
fore, it increases the fault tolerance of the system
without severely affecting the performance.
This paper has six sections. Next section dis-
cusses the related work reported in the literature
with the similar objective followed by a section
Computational resources being scarce requires an
efficient use of these resources. Resources may
vary from specialized computational machines,
storage machines to heterogeneous applications.
Grid is the aggregation of the resources across the
world seamlessly and enabling their use as, when
and wherever desired rather than individual group
investing heavily for high performance computa-
tional resources. In the era of high performance
and high throughput computing, grid has emerged
as an efficient means of connecting distributed
computers or resources scattered all over the
world for the purpose of collaborative computing
thus essentially unifying various heterogeneous
resources on a common platform while dimin-
ishing the administrative boundaries to provide
a transparent access to a user. Essentially being
a part of the grid means an infinite capability to
execute and compute any kind of job anywhere
by simply becoming its part. Therefore, even if
the appropriate computational capabilities are not
available with the user, the grid helps the job to
be executed on the right resources thereby being
efficient as well as cost effective.
Depending on the use grids can be classi-
fied as Computational grid, Data grid, Sensor
grid, Biological grid etc. A computational grid
emphasizes on the computing aspect thus sched-
uling the job to the grid resources by exploring
the computational requirements of the job and
effectively load balancing it. Scheduling can
be based on various objectives like maximizing
the reliability of job execution, minimizing the
make span or maximizing the Quality of Service
(QoS) for the job execution (Grid Computing
Info centre, 2008; Baker, Buyya, & Laforenza,
2002; Tarricone & Esposito, 2005; Ernemann,
Hamscher, & Yahyapour, 2002; Casanova, 2002;
Vidyarthi, Sarker, Tripathi & Yang, 2009; Raza
& Vidyarthi, 2008, 2009).
Execution of a job on the complex and dynamic
grid poses number of challenges. One of these
Search WWH ::




Custom Search