The Organic Grid: Self-organizing Computational Biology on Desktop Grids - Parallel Computing for Bioinformatics and Computational Biology

Biomedical Engineering Reference

In-Depth Information

27.2.6.5 Size of Result Burst Each agent of an ITA ranks its children on the

basis of the time taken to send some results to this node. The time required to obtain

just one result-burst, or a result-burst of size 1, might not be a good measure of the

performance of a child. Nodes might make poor decisions about which children to

keep and discard. The child propagation algorithm benefits from using the average of

R result-burst intervals and from setting r , the result-burst size, to be greater than 1.

A better measure for the performance of a child is the time taken by a node to obtain

r

1 ) results. However, r and R should not be set to very large values because

the overlay network would take too much time to take form and to get updated.

∗

(R

+

27.2.6.6 Fault Tolerance If the parent of a node were to become inaccessible

due to machine or link failures, the node and its own descendants would be discon-

nected from the tree. The application might require that a node remain in the tree at all

times. In this scenario, the node must be able to contact its parent's ancestors. Every

node keeps a (constant size) list of a of its ancestors. This list is updated every time

its parent sends it a message. The updates to the ancestor-list take into account the

possibility of the topology of the overlay network changing frequently.

A child sends a message to its parent — the a th node in its ancestor-list. If it is

unable to contact the parent, it sends a message to the ( a

−

1)th node in that list. This

goes on until an ancestor responds to this node's request. The ancestor becomes the

parent of the current node and normal operation resumes.

If a node's ancestor-list goes down to size 0, it attempts to obtain the address of

some other agent by checking its data distribution and communication overlays. If

these are the same as the scheduling tree, the node has no means of obtaining any more

work to do. The mobile agent informs the agent environment that no useful work is

being done by this machine, before self-destructing. The environment begins to send

out requests for work to a list of friends. The pseudo-code for the fault tolerance

algorithm is shown in Figure 27.8.

To recover from the loss of tasks by failing nodes, every node keeps track of

unfinished subtasks that were sent to children. If a child requests additional work

and no new task can be obtained from the parent, unfinished tasks are handed out

again.

Figure 27.8

Fault tolerance — contacting ancestors.

Search WWH ::

Custom Search

Home