Information Technology Reference
In-Depth Information
Failures of components of the agent system: Failures of agent places, or components of agent
places become faulty, for example, faulty communication units or incomplete agent directory.
These faults can result in agent failures, or in reduced or wrong functionality of agents.
Failures of mobile agents: Mobile agents can become faulty due to faulty computation, or other
faults (e.g., node or network failures).
Network failures: Failures of the entire communication network or of single links can lead to
isolation of single nodes, or to network partitions.
Falsification or loss of messages: These are usually caused by failures in the network or in the
communication units of the agent systems, or the underlying operating systems. Also, faulty
transmission of agents during migration belongs to this type.
Especially in the intended scenario of parallel applications, node failures and their consequences are
important. Such consequences are loss of agents, and loss of node specific resources. In general, each
agent has to fulfill a specific task to contribute to the parallel application, and thus, agent failures must
be treated with care. In contrast, in applications where a large number of agents are sent out to search
and process information in a network, the loss of one or several mobile agents might be acceptable
(Pleisch & Schiper, 2000, 2001).
Model f ailures
Machines, places, or agents can fail and recover later. A component that has failed but not yet recovered
is called down; otherwise, it is up. If it is eventually permanently up, it is called good (Aguilera, 2000).
In this chapter, we focus on crash failures (i.e., processes prematurely halted). Benign and malicious
failures (i.e., Byzantine failures) are not discussed. A failing place causes the failure of all agents run-
ning on it. Similarly, a failing machine causes all places and agents on this machine to fail as well.
We do not consider deterministic, repetitive programming errors (i.e., programming errors that occur
on all agent replicas or places) in the code or the place as relevant failures in this context. Finally a
link failure causes the loss of messages or agents currently in transmission on this link and may lead
to network partitioning. We assume that link failures (and network partitions) are not permanent. The
failure of a component (i.e., agent, place, machine, or communication link) can lead to blocking in the
mobile agent execution.
Figure 2. The redundant places mask the place failure
Search WWH ::




Custom Search