Information Technology Reference
In-Depth Information
recording local resources and registers them to its related GIIS. In order to
simplify task fault tolerance, we only focus on sequential programs; that is,
a task is only submitted to one grid node at a time.
The workl ow of t a sk fau lt de te c t ion i s de pic te d i n Fig u r e 4.7. A dat a col le c tor
is coni gured in the GRB. When many tasks are submitted to the GRB, the
GRB looks up a GIIS for registered resources by LDAP, gets the required
resource information, and specii es the suitable resource to any submitted
task according to the requirement of the task. Tasks with a fault-tolerant
requirement register to the data collector in the GRB and are scheduled to a
selected grid node. At the same time, a callback mechanism is added to the
submitted task. The grid resource management component verii es the cor-
responding scheduled task [49]. Once the task is legal, it is accepted by the
grid resource management component. Otherwise, it is refused.
The work completed by user program codes is transferred to GFTP,
instead of being implemented with Globus fault detection APIs in pro-
gram codes. The resource management component analyzes the task's
resource specii cation language (RSL) and determines a job-scheduling
system, such as fork, LSF, and so on. The selected job-scheduling system
starts the task in the local node and registers it with the local monitor. The
grid resource management component provides the local monitor with
the identity of the user task process to be monitored, the data collector to
which the process heartbeat is to be sent, and a heartbeat interval. The
local monitor is responsible for detecting the task status and reporting it to
the data collector. If the process terminates, the component disconnects it
Task 1
Task 2
Grid
resource broker
Grid information
index server
Data collector
Register
Status
feedback
Schedule
Local
components
Local
monitor
Local
components
Local
monitor
Monitor
Schedule
Register
Task 1
Task 2
FIGURE 4.7
Task fault detection.
Search WWH ::




Custom Search