Information Technology Reference
In-Depth Information
4.5.2.3 Fault Tolerance of the System-Level
Component ............................................................. 96
4.5.2.4 Task Fault Tolerance .............................................. 96
4.5.3 Grid Fault Detection .......................................................... 101
4.5.3.1 Architecture ........................................................... 101
4.5.3.2 Adaptive Model ..................................................... 102
4.5.4 Adaptive Application Fault Tolerance ............................. 105
4.5.4.1 Overview of Failure Handling.............................. 106
4.5.4.2 Application-Level Fault-Tolerance
Techniques ............................................................. 107
4.5.4.3 Model of Policy Making......................................... 108
4.6 Conclusion ..................................................................................... 113
Acknowledgments .............................................................................. 113
References ............................................................................................ 113
4.1
In recent years, grid computing has become very popular for its potential
of aggregating high-performance computational and large-scale storage
resources that distribute over the Internet. According to [1,2], grid
computing is “resource sharing and coordinated problem solving in
dynamic, multi-institutional virtual organizations.” The purpose of grid
computing is to eliminate the resource islands in the application level, and
to make computing and services ubiquitous.
The prerequisites for grid computing lie in three aspects: network infra-
structure, wide-area distribution of computational resources, and contin-
uous increasing requirement for resource sharing. Nearly all the existing
grid computing projects are based on existing network infrastructure,
such as the UK e-Science Programme [3], Information Power Grid (IPG)
[4], and TeraGrid [5]. In TeraGrid, the i ve key grid computing sites are
interconnected via 30 or 40 Tb/s fast network connections. The ChinaGrid
project [6], which will be discussed in detail in this chapter, is also based
on a long running network infrastructure, which is called China Education
and Research Network (CERNET) [7].
Grid computing provides new potentials for distributed computing,
as well as many challenges. Because of the specii c features of grids (such
as dynamic, large-scale, wide-area distribution), there are many challenges
in constructing dependable grid services; for example: (1) power failure
leading to power loss of one part of the distributed system; (2) physical
damage to the grid computing fabric as a result of natural events or human
acts; and (3) failure of system or an application software leading to loss of
service. Owing to the diverse failures and error conditions in the grid
Introduction
 
 
 
 
Search WWH ::




Custom Search