China Grid and Related Dependability Research - Grid Computing: Infrastructure, Service, and Applications

Information Technology Reference

In-Depth Information

With the above dei nitions, we dei ne the QoS of a service s as a tuple:

Q ( s ) = [ Q et ( s ), Q co ( s )]

(4.16)

4.5.4.3.2

In this section, the quality criteria for different failure-recovery policies

are presented. For simplicity and feasibility, only i ve policies are discussed

in DRIC, and they are retrying, replication, checkpointing, retrying with

replication, and replication with checkpointing. The workl ow method

such as alternative task is specii ed by the user, and the system cannot

control it but only execute it. Here are the dei nitions of the i ve failure-

recovery policies.

Quality Criteria

1. Retrying: Given a task [ t 1 , t 2 , . . . , t n ], if the task fails at t i , it will be

re-executed from t 1 .

2. Replication: Given a task [ t 1 , t 2 , . . . , t n ], there are m replications for

this task, they are [ t 11 , t 12 , . . . , t 1 n ], [ t 21 , t 22 , . . . , t 2 n ], . . . , [ t m 1 , t m 2 , . . . , t mn ].

If any task fails at t ij , replica i is killed. If any replicated task suc-

cessfully i nishes at t in , other replicas are killed.

3. Checkpoi nt i ng: Give a task [ t 1 , t 2 , . . . , t n ], the running states are saved

to a i le. If the task fails at t i , the task will be executed from t i .

4. Replication with checkpointing: Given a task [ t 1 , t 2 , . . . , t n ], there

are m replications for this task, they are [ t 11 , t 12 , . . . , t 1 n ], [ t 21 , t 22 , . . . ,

t 2 n ], . . . , [ t m 1 , t m 2 , . . . , t mn ]. The running states are saved to i les. If any

task fails at t ij , the task will be executed from t ij . If any replicated

task successfully i nished at t in , other replicas are killed.

5. Retrying with replication: Given a task [ t 1 , t 2 , . . . , t n ], there are m

replications for this task, they are [ t 11 , t 12 , … , t 1 n ], [ t 21 , t 22 , . . . , t 2 n ], . . . ,

[ t m 1 , t m 2 , … , t mn ]. If any task fails at t ij , the task will be re-executed

from t ij . If any replicated task successfully i nished at t in , other

replicas are killed.

After the dei nition of the failure-recovery policies, the QoS criteria about

fault tolerance are presented. For expression simplicity and feasibility, some

propositions about the failure behaviors in grid computing are given i rst.

Assumption 1

1. The time a task at state t i takes is the duration that the task per-

forms specii c operation on a specii c service.

2. The time switch from state t i to state t i +1 is zero, that is, the task

runs into state t i +1 immediately when state t i i nishes.

Search WWH ::

Custom Search

Home