Information Technology Reference
In-Depth Information
copies of the modules on more than one node.
In case of a node failure, the duplicate copies of
the modules continues for the job execution. The
duplicate copies are used only when a node fails
otherwise the job is executed as per the originally
scheduled allocation. The job of the RBS starts
when the job of the main scheduler in allocating
the job modules to various nodes has finished.
It is then that the RBS takes control to provide
robustness and fault tolerance to the cluster con-
taining the computational resources. The RBS can
be used along with any scheduler available in the
grid middleware. The inclusion of RBS enables
the grid to respond graciously to the node failures
with the cost of compromising the performance of
the grid, which is unavoidable since the replicated
modules have an altered sequence of execution as
compared to the original schedule. RBS strategy
provides an important backup in absence of which
the job needs to be scheduled afresh again result-
ing in consumption of computational energy that
proves very costly for the high traffic environment
such as grid. For the real time jobs the problem
becomes much more severe as the failures may
impact he grid performance thus hitting the fi-
nancial prospects of the grid.
minimized. In the present work, the performance
of the RBS has been analyzed by integrating it
with a TSM scheduler.
The TSM model considers the grid as collec-
tion of many clusters, each with a specialization,
consisting of a number of nodes for job execution.
This is a multipoint entry grid in which the job can
be fired at any node of the constituent clusters.
The main scheduler (TSM) searches for the ap-
propriate cluster matching the job's requirements
and offering the minimum turnaround time to the
job, on which the job is eventually scheduled. The
job is submitted for execution along with its Job
Precedence and Dependence Graph (JPDG) in
which the position of each module of the job indi-
cates its order of execution. It also depicts degree
of parallelism and the interaction dependence of
that module with the preceding modules in terms
of the communication requirements.
The allocation status of the various jobs is
maintained with each cluster in a data structure
known as the Cluster Table (CT), which is updated
periodically to reflect updated allocations. The CT
consists of the following attributes
C n (S n , P k , f k , λ lt , M ij , T prkn )
Where C n refers to the cluster under consider-
ation with specialization S n , number of nodes P k ,
the clock frequency of each node f k , failure rate
of each node λ lt , modules assigned on the nodes
M ij and the time to finish existing modules T prkn
on the nodes. As obvious, the CT provides the
information regarding the cluster constituents
e.g. the specialization of the cluster nodes to help
allocating the jobs to appropriate resources as
per its requirements and specifications, number
of nodes in the cluster, their clock frequency, the
failure rate of nodes, present allocation, and the
time taken to finish the existing modules already
allocated on the nodes. The main scheduler in this
case is TSM but it can be any scheduler proposing
a scheduling strategy for the modular job. Since the
objective of the TSM is to minimize the turnaround
INTEGRATION OF RBS WITH TSM
To analyze the performance of the co-scheduler
RBS it is essential to have a scheduler, which
schedules the job submitted to the grid on ap-
propriate resources based on certain optimiza-
tion parameter. These parameters may vary e.g.
turnaround time, reliability, security, Quality of
Service (QoS) etc. Minimizing the turnaround time
for the job submitted is often a desired parameter
and has been addressed in the Turnaround Based
Scheduling Model (TSM) for computational grids
using Genetic Algorithm (GA) in [8]. The TSM
model uses GA to schedule a modular job on a
cluster based grid to suggest an allocation pattern
in such a way that the turnaround time of the job is
Search WWH ::




Custom Search