Image Processing Reference
In-Depth Information
In a time-triggered communication protocol, the error containment mechanisms for timing
message failures can be enforced transparently to the application. Using the a priori knowledge con-
cerning the global points in time of all intended message sent and received instants, autonomous
guardians can block timing message failures. For this purpose, node-local and centralized guardians
have been developed and validated for different time-triggered communication protocols (e.g.,
[BKS,Gua]). For example, in TTP, a guardian transforms a message which it judges untimely
into a syntactically incorrect message by cutting off its tail [Kop].
Error containment for value message failures is generally not part of a time-triggered communica-
tion protocol, but within the responsibility of the host computers. For example, value failure detection
and correction can be performed using N-modular redundancy (NMR). N replicas receive the same
requestsandprovidethesameservice.heoutputofallreplicasisprovidedtoavotingmechanism,
which selects one of the results (e.g., based on majority) or transforms the results to a single one
(average voter). The most frequently used N-modular configuration is triple-modular redundancy
(TMR). By employing three components and a voter, a single consistent value failure in one of the
constituting components can be tolerated.
Although not natively provided by time-triggered communication protocols, NMR is enabled
by time-triggered communication protocols by supporting replica determinism [Pol]. Fault-free
replicated components exhibit replica determinism, if they deliver identical outputs in an identical
order within a specified time interval. Replica determinism simplifies the implementation of fault-
tolerance by active redundancy, as failures of components can be detected by carrying out a bit-by-bit
comparison of the results of replicas. Replica nondeterminism is introduced either by the interface
to the real world or the system's internal behavior.
14.4.4 Diagnostic Services
Diagnostic services are concerned with the identification of failed subsystems. Diagnostic services
can trigger the autonomous recovery of a system in case of a transient subsystem failure. In addition,
diagnostic services can support the replacement of defective subsystems if a failure is permanent.
An example of a diagnostic services that can be found in time-triggered communication protocols
is a solution to the membership problem. The membership problem is a fundamental problem in
distributed computing, because it allows solutions to other important problems in designing fault-
tolerant systems [GP]. he membership problem is defined as the problem of achieving agreement
on the identity of all correctly functioning processes of a process group. A process is correct, if its
behavior complies with the specification. Otherwise the process is denoted as faulty.
In the context of integrated architecture, it makes sense to establish membership information for
FCRs, as FCRs can be expected to fail independently. Depending on the assumed types of faults, an
FCR is either an entire system component or a subsystem within a component (e.g., a task) dedicated
to a function.
A service that implements an algorithm for solving the membership problem and offers consistent
membership information is called a membership service. A membership service simplifies the pro-
vision of many application algorithms, as the architecture offers generic error detection capabilities
via this service. Applications can rely on the consistency of the membership information and react
to detected failures of FCRs as indicated by the membership service.
A membership service also plays an important role for controlling application level fault-tolerance
mechanisms that deal with failures of functions. If a function fails—as more FCRs have failed than
can be tolerated by the given amount of redundancy—all that an integrated architecture can do is
to inform other functions about this condition so they can react accordingly by application level
fault-tolerance mechanisms.
In a time-triggered communication system, the periodic message send times are membership
points of the sender [KGR]. Every receiver knows a priori when a message of a sender is supposed
 
Search WWH ::




Custom Search