China Grid and Related Dependability Research - Grid Computing: Infrastructure, Service, and Applications

Information Technology Reference

In-Depth Information

handling method without having any knowledge about the grid

resources and the applications.

Task-specii c failure handling support. This is driven by the diver-

•

sity of grid applications and grid resources. The user should be

able to specify an appropriate failure handling method for perfor-

mance and cost consideration.

In this section, an adaptive policy-based fault-tolerance approach is

presented, which is called DRIC. First the overview of the approach is

presented, with application-level fault-tolerance techniques reviewed

later. Finally, the adaptive model of failure handling will be presented.

4.5.4.1

Overview of Failure Handling

As depicted in Figure 4.12, the fault tolerance in DRIC framework com-

prises two phases: failure detection and failure handling. Figure 4.12

presents the overview of the failure handling approach. The failure-

handling approach uses a decision-making method to attain the QoS

requirements described by users, in which almost all kinds of application-

level failure-recovery techniques are integrated, such as checkpointing

[59-62], replication [63-66], and workl ow [67]. First, the user submits a

task with QoS requirements. The policy engine analyzes the QoS

requirements and the attributes of the application to constitute a fail-

ure-recovery policy. Based on the policy, the policy engine carries out

the policy with appropriate techniques with the help of job manage-

ment, data management, and information management illustrated in

Figure 4.13 .

Application/user interface

Failure detector

Failure handling

Data

collector

Policy

maker

Policy

executor

Index

service

Failure-recovery

techniques

Heartbeat

monitor

Resource management

FIGURE 4.12

Overview of failure handling.

Grid Computing: Infrastructure, Service, and Applications

Search WWH ::

Custom Search

Home