Information Technology Reference
In-Depth Information
Step 9: [handle failure of local detectors]
The group leader can detect the local detector's failure and, if it fails, just
add on and let the monitored objects send “I'm alive” to it.
Step 10: [handle failure of group leader]
The top failure detector and the index service can detect the group leader's
failure and, if one of them fails, just deploy a new one on the same host
which is registered to the index service.
Step 11: [merge small groups]
When monitored objects decreased, compute number of groups needed as
Í
˙
n
=
2log g
N
(4.12)
Î
˚
S
total
Choose S g as in Equation 4.13.
Í
˙
S
= Î
N
lg
N
(4.13)
˚
g
total
total
The network conditions between two failure detectors is valued as in
Equation 4.14:
"
i
,,
j T
£
Φ
i
,
j
Œ
LAN
Æ
good
(4.14)
ij
Try to annex the smallest group to the second smallest one if their network
conditions are good enough, and loop this process, till the number of
groups is less than n . With the algorithms presented above, the failure
detector is able to be adaptive according to different grid environments.
4.5.4
Adaptive Application Fault Tolerance
There are different types of failures in grid systems and grid applications
due to the diverse nature of the grid components and grid applications.
The existing failure handling techniques in distributed systems, parallel
systems, or even grid systems address failure handling with one scheme,
which cannot handle failures in grids with different semantics. The failure
handling method in grids should address the following requirements:
Support diverse failure handling strategies. This is driven by the
heterogeneous nature of the grid context, such as heterogeneous
tasks and heterogeneous execution environments.
Separation of failure handling policies from application codes.
This is driven by the dynamic nature of the grid system. The sepa-
ration of policies from the application codes provides a high-level
 
Search WWH ::




Custom Search