Information Technology Reference
In-Depth Information
E ijkn = I i * (1/f k ) + n * α
(iii)
Here, ClusRel jn as stated in eq. (iii) is the reli-
ability offered to the job J j without node failure
and K C I accounts for the failure of 'I' nodes out
of the available 'K' nodes on which original al-
location has been made.
x ijk is the vector indicating the assignment of
module m i of job J j on node P k . It assumes a bi-
nary value. It is 1 if the module is allocated to the
node and is 0 otherwise. T prkn is the time to finish
execution of the present modules on the node P k .
RBS Algorithm
1
i
1
(
)
The factor
w B D
.
x x
.
represents the
ihj
k
ln
ijk
hjl
The TSM essentially schedules the job on the clus-
ter offering the minimum turnaround from a group
of clusters with matching specialization of the job.
Once the cluster is selected for job allocation, its
Cluster Table (CT) is updated to accommodate
the new job. The job of the RBS begins where the
job of TSM finishes. For the cluster selected, the
RBS evaluates the vulnerability of the nodes on
which an allocation has been done by comparing
their failure rates λ lt with some threshold failure
rate λ th which depends on the domain knowledge
of the cluster along with the acceptance level of
the failures. Accordingly the nodes are judged as
healthy and sick nodes. For the sick nodes, CT is
referred to check for any allocations made. These
modules are then duplicated on some healthy node,
selected randomly. The algorithm for the same is
shown in the box.
Now if a failure is detected the system does
not fail completely as copies of the modules on
the failed node are still available on some other
nodes. The execution of the job still follows the
JPG with the penalty of increase in the turnaround
time. It is due to some nodes waiting for the pre-
h
=
communication cost between a module m h with
the previous modules m i as per the JPDG, B ihj
being the number of bytes that need to be ex-
changed between modules m i and m h and D kl is
the hamming distance between nodes P k and P l
involved in data exchange. w is the scaling factor
1
i
1
(
)
to scale the term
B D
.
x x
.
into time
ihj
k
ln
ijk
hjl
h
=
unit.
The reliability offered by the cluster of the grid,
ClusRel jn , as per the allocation pattern suggested by
the chromosome can be written as shown in Box
1, where ModRel ik is the reliability offered by the
grid when module m i has been assigned on node
P k . Introduction of replicated modules increases
the reliability of the job execution. At any time,
the reliability offered to the job with replication,
ClusRelRep jn , can be written as
K
ClusRelRep = ClusRel
+
C * ClusRel
jn
jn
I
jn
(v)
Box 1.
M
Õ
ClusRel = ModRel
(iii)
jn
ik
i=1
ClusRel =
jn
M
i-1
i=1
exp - (
µ λ
) E .x +( + )
µ ξ
w(B D
)x .x +
λ
T
ij+ kn
ijkn
ijk
ij
kl
ihj.
kln
ijk
hjl
kn
prkn
h=1
(iv)
 
Search WWH ::




Custom Search