Information Technology Reference
In-Depth Information
4.5.3
Grid Fault Detection
Fault detection is well known as a fundamental building block for depend-
able systems [50-53]. In the ChinaGrid, an adaptive fault detection service
named ALTER has been designed by the Cluster and Grid Computing Lab,
Huazhong University of Science and Technology. ALTER addresses the
scalability and l exibility of the grid system [54,55], which blends the unre-
liable failure detectors in distributed systems [56] and R-GMA (relational
grid monitoring architecture). ALTER can i ne tune system performance
with different QoS requirements, and change its topology according to
changes of grid environments, such as the addition of some resources,
some crashing key components, or even crashing failure detectors.
4.5.3.1
Architecture
ALTER is organized in a hierarchical structure as shown in Figure 4.11. The
system architecture is composed of two levels: local groups and global
groups. In the local group, there is a unique group leader. Failure detectors
in the local groups monitor the objects in the local space, and the monitored
objects in one local space may be in one LAN or cross-LANs, but the net-
work condition in one space should be good. Failure detectors in the global
groups monitor the global objects by means of monitoring the detectors in
local groups. Thus, in this architecture there are two different types of fail-
ure detectors: a local failure detector and a group leader. The monitored
objects send “I'm alive” messages to the local failure detectors periodically,
while the messages that a group leader sends is a list containing the moni-
tored objects and their status, and the group leaders share failure detection
messages with epidemic methods [57]. For management simplicity, there is
an index service in the system implementation, and the index service works
as a directory registry of the global group failure detectors and local group
failure detectors. Also, the index service provides some decision-making
capability for the organization of group leaders. There are three compo-
nents or roles in ALTER system: consumers, producers, and an index ser-
vice. A consumer who wants to detect some objects in a grid, i rst queries
Index
server
Upper lever group
Group 1
Group 2
Group 3
Group n
LAN1
LAN2
LAN3
LAN4
LAN5
LANm
FIGURE 4.11
Architecture of ALTER.
 
Search WWH ::




Custom Search