Modeling Fault Tolerant and Secure Mobile Agent Execution in Distributed Systems - Distributed Artificial Intelligence, Agent Technology, and Collaborative Applications

Information Technology Reference

In-Depth Information

•

Failures of components of the agent system: Failures of agent places, or components of agent

places become faulty, for example, faulty communication units or incomplete agent directory.

These faults can result in agent failures, or in reduced or wrong functionality of agents.

•

Failures of mobile agents: Mobile agents can become faulty due to faulty computation, or other

faults (e.g., node or network failures).

•

Network failures: Failures of the entire communication network or of single links can lead to

isolation of single nodes, or to network partitions.

•

Falsification or loss of messages: These are usually caused by failures in the network or in the

communication units of the agent systems, or the underlying operating systems. Also, faulty

transmission of agents during migration belongs to this type.

Especially in the intended scenario of parallel applications, node failures and their consequences are

important. Such consequences are loss of agents, and loss of node specific resources. In general, each

agent has to fulfill a specific task to contribute to the parallel application, and thus, agent failures must

be treated with care. In contrast, in applications where a large number of agents are sent out to search

and process information in a network, the loss of one or several mobile agents might be acceptable

(Pleisch & Schiper, 2000, 2001).

Model f ailures

Machines, places, or agents can fail and recover later. A component that has failed but not yet recovered

is called down; otherwise, it is up. If it is eventually permanently up, it is called good (Aguilera, 2000).

In this chapter, we focus on crash failures (i.e., processes prematurely halted). Benign and malicious

failures (i.e., Byzantine failures) are not discussed. A failing place causes the failure of all agents run-

ning on it. Similarly, a failing machine causes all places and agents on this machine to fail as well.

We do not consider deterministic, repetitive programming errors (i.e., programming errors that occur

on all agent replicas or places) in the code or the place as relevant failures in this context. Finally a

link failure causes the loss of messages or agents currently in transmission on this link and may lead

to network partitioning. We assume that link failures (and network partitions) are not permanent. The

failure of a component (i.e., agent, place, machine, or communication link) can lead to blocking in the

mobile agent execution.

Figure 2. The redundant places mask the place failure

Search WWH ::

Custom Search

Home