Information Technology Reference
In-Depth Information
With the above dei nitions, we dei ne the QoS of a service s as a tuple:
Q ( s ) = [ Q et ( s ), Q co ( s )]
(4.16)
4.5.4.3.2
In this section, the quality criteria for different failure-recovery policies
are presented. For simplicity and feasibility, only i ve policies are discussed
in DRIC, and they are retrying, replication, checkpointing, retrying with
replication, and replication with checkpointing. The workl ow method
such as alternative task is specii ed by the user, and the system cannot
control it but only execute it. Here are the dei nitions of the i ve failure-
recovery policies.
Quality Criteria
1. Retrying: Given a task [ t 1 , t 2 , . . . , t n ], if the task fails at t i , it will be
re-executed from t 1 .
2. Replication: Given a task [ t 1 , t 2 , . . . , t n ], there are m replications for
this task, they are [ t 11 , t 12 , . . . , t 1 n ], [ t 21 , t 22 , . . . , t 2 n ], . . . , [ t m 1 , t m 2 , . . . , t mn ].
If any task fails at t ij , replica i is killed. If any replicated task suc-
cessfully i nishes at t in , other replicas are killed.
3. Checkpoi nt i ng: Give a task [ t 1 , t 2 , . . . , t n ], the running states are saved
to a i le. If the task fails at t i , the task will be executed from t i .
4. Replication with checkpointing: Given a task [ t 1 , t 2 , . . . , t n ], there
are m replications for this task, they are [ t 11 , t 12 , . . . , t 1 n ], [ t 21 , t 22 , . . . ,
t 2 n ], . . . , [ t m 1 , t m 2 , . . . , t mn ]. The running states are saved to i les. If any
task fails at t ij , the task will be executed from t ij . If any replicated
task successfully i nished at t in , other replicas are killed.
5. Retrying with replication: Given a task [ t 1 , t 2 , . . . , t n ], there are m
replications for this task, they are [ t 11 , t 12 , … , t 1 n ], [ t 21 , t 22 , . . . , t 2 n ], . . . ,
[ t m 1 , t m 2 , … , t mn ]. If any task fails at t ij , the task will be re-executed
from t ij . If any replicated task successfully i nished at t in , other
replicas are killed.
After the dei nition of the failure-recovery policies, the QoS criteria about
fault tolerance are presented. For expression simplicity and feasibility, some
propositions about the failure behaviors in grid computing are given i rst.
Assumption 1
1. The time a task at state t i takes is the duration that the task per-
forms specii c operation on a specii c service.
2. The time switch from state t i to state t i +1 is zero, that is, the task
runs into state t i +1 immediately when state t i i nishes.
Search WWH ::




Custom Search