Image Processing Reference
In-Depth Information
14.3.1 Fault, Error, and Failure
A failure [Lap] occurs when the delivered service deviates from fulfilling the functional specifica-
tion. An error is that part of the system state which is liable to lead to a subsequent failure. A failure
occurs when the error reaches the service interface. A fault is the adjudged or hypothesized cause of
an error. As stated in Ref. [ALR], the concept of fault is introduced to stop recursion.
Due to the recursive definition of systems, a failure at a particular level of decomposition can be
interpreted as a fault at the next upper level of decomposition, thereby leading to a hierarchical causal
chain.
14.3.2 Fault Containment
A fault containment region (FCR) is defined as a subsystem that operates correctly regardless of any
arbitrary logical or electrical fault outside the region [LH]. The justification for building ultra-
reliable systems from replicated resources rests on an assumption of failure independence among
redundant units. For this reason, the independence of FCRs is of critical importance [BCV]. he
independence of FCRs can be compromised by shared physical resources (e.g., power supply, timing
source), external faults (e.g., electromagnetic interference, spatial proximity), and design.
14.3.2.1 Error Containment
Although an FCR can restrict the immediate impact of a fault, fault effects manifested as erroneous
data can propagate across FCR boundaries using the communication system. For this reason, the
system must also provide error containment [LH] to avoid error propagation through message
failures.Amessagefailurecanbeeitheramessagevaluefailureoramessagetimingfailure[CASD].
A message value failure occurs if the data contained in a message are incorrect. A message timing
failure means that the message send or receive instants are not in agreement with the specification.
Error containment involves an independent component for error detection and mediation of a
component's access to the shared network. The error detection and mediation mechanisms must
be part of a different FCR than the message sender [Kop]. Otherwise, the error containment
mechanisms may be impacted by the same fault that caused the message failure.
One can distinguish two types of error containment [Rus]:
Spatial Partitioning : Spatial partitioning ensures that software in one FCR cannot alter
thecodeorprivatedataofanotherFCR.SpatialpartitioningalsopreventsanFCRfrom
interfering with control of external devices (e.g., actuators) of other FCRs.
Temporal Partitioning : Temporal partitioning ensures that an FCR cannot affect the ability
of other FCRs to access shared resources, such as the common network or a shared CPU.
This includes the temporal behavior of the services provided by resources (latency, jitter,
duration of availability during a scheduled access).
The probability for preventing the propagation of the consequences of an error is called the error-
containment coverage. Error containment is a prerequisite for building fault-tolerant systems as
without error containment a single fault has the ability to corrupt the whole system.
14.4 Fundamental Services of a Time-Triggered
Communication Protocol
In the following, four fundamental services of a time-triggered communication protocol are
explained, namely, clock synchronization, the periodic exchange of state messages, fault isolation,
and diagnostic services.
 
Search WWH ::




Custom Search