Biomedical Engineering Reference
In-Depth Information
27.2.6.5 Size of Result Burst Each agent of an ITA ranks its children on the
basis of the time taken to send some results to this node. The time required to obtain
just one result-burst, or a result-burst of size 1, might not be a good measure of the
performance of a child. Nodes might make poor decisions about which children to
keep and discard. The child propagation algorithm benefits from using the average of
R result-burst intervals and from setting r , the result-burst size, to be greater than 1.
A better measure for the performance of a child is the time taken by a node to obtain
r
1 ) results. However, r and R should not be set to very large values because
the overlay network would take too much time to take form and to get updated.
(R
+
27.2.6.6 Fault Tolerance If the parent of a node were to become inaccessible
due to machine or link failures, the node and its own descendants would be discon-
nected from the tree. The application might require that a node remain in the tree at all
times. In this scenario, the node must be able to contact its parent's ancestors. Every
node keeps a (constant size) list of a of its ancestors. This list is updated every time
its parent sends it a message. The updates to the ancestor-list take into account the
possibility of the topology of the overlay network changing frequently.
A child sends a message to its parent — the a th node in its ancestor-list. If it is
unable to contact the parent, it sends a message to the ( a
1)th node in that list. This
goes on until an ancestor responds to this node's request. The ancestor becomes the
parent of the current node and normal operation resumes.
If a node's ancestor-list goes down to size 0, it attempts to obtain the address of
some other agent by checking its data distribution and communication overlays. If
these are the same as the scheduling tree, the node has no means of obtaining any more
work to do. The mobile agent informs the agent environment that no useful work is
being done by this machine, before self-destructing. The environment begins to send
out requests for work to a list of friends. The pseudo-code for the fault tolerance
algorithm is shown in Figure 27.8.
To recover from the loss of tasks by failing nodes, every node keeps track of
unfinished subtasks that were sent to children. If a child requests additional work
and no new task can be obtained from the parent, unfinished tasks are handed out
again.
Figure 27.8
Fault tolerance — contacting ancestors.
Search WWH ::




Custom Search