China Grid and Related Dependability Research - Grid Computing: Infrastructure, Service, and Applications

Information Technology Reference

In-Depth Information

Step 9: [handle failure of local detectors]

The group leader can detect the local detector's failure and, if it fails, just

add on and let the monitored objects send “I'm alive” to it.

Step 10: [handle failure of group leader]

The top failure detector and the index service can detect the group leader's

failure and, if one of them fails, just deploy a new one on the same host

which is registered to the index service.

Step 11: [merge small groups]

When monitored objects decreased, compute number of groups needed as

Í

˙

n

=

2log g

N

(4.12)

Î

˚

S

total

Choose S g as in Equation 4.13.

Í

˙

S

= Î

N

lg

N

(4.13)

˚

g

total

The network conditions between two failure detectors is valued as in

Equation 4.14:

"

i

,,

j T

£

Φ

i

,

j

Œ

LAN

Æ

good

(4.14)

ij

Try to annex the smallest group to the second smallest one if their network

conditions are good enough, and loop this process, till the number of

groups is less than n . With the algorithms presented above, the failure

detector is able to be adaptive according to different grid environments.

4.5.4

Adaptive Application Fault Tolerance

There are different types of failures in grid systems and grid applications

due to the diverse nature of the grid components and grid applications.

The existing failure handling techniques in distributed systems, parallel

systems, or even grid systems address failure handling with one scheme,

which cannot handle failures in grids with different semantics. The failure

handling method in grids should address the following requirements:

•

Support diverse failure handling strategies. This is driven by the

heterogeneous nature of the grid context, such as heterogeneous

tasks and heterogeneous execution environments.

Separation of failure handling policies from application codes.

•

This is driven by the dynamic nature of the grid system. The sepa-

ration of policies from the application codes provides a high-level

Grid Computing: Infrastructure, Service, and Applications

Search WWH ::

Custom Search

Home