Hardware Reference
In-Depth Information
controllability, reproducibility, intrusiveness, etc.) should be preferred. However,
if different behaviors are observed, then the techniques are rather complementary.
Such an insight is very helpful in the light of the recent work devoted to devel-
oping dependability benchmarks ( Kanoun and Spainhower 2008 ), 3 in particular
to substantiate which kind of relevant “faultload” should be considered for such
benchmarks.
The four techniques - heavy-ion radiation, pin-level injection, electromagnetic
interferences, as well as a compile-time SWIFI - described in Section 8.3.2 were
jointly applied and analyzed. It is worth noting that in order to carry out all the fault
injection experiments on a consistent basis, we used the same distributed testbed
architecture featuring five MARS 4 nodes and a common test scenario. The assess-
ment of the fault injection techniques is supported by using the EDMs built-in into a
MARS node as “observers” to characterize the erroneous behaviors induced by the
faults injected by the techniques considered. The EDMs are meant to procure the
MARS node with self-checking properties suitable to confer it a fail-silent behav-
ior 5 . The experiments conducted were also aimed at assessing the extent to which
this property was ensured.
In the sequel, we briefly present the relevant features of the target system and of
the testbed. Finally, some typical results are presented and the main insights gained
are summarized.
8.3.4.1
The Target System and the Testbed
We focus here on the fault tolerance feature of a special-purpose processing node
designed to support the fail-silent property. Special attention is paid to the identifi-
cation and characterization of the error detection mechanisms (EDMs) built-in into
a MARS node.
The MARS Processing Node This study uses a single-board implementation of
the MARS node. More details on MARS features and on the architecture of the
processing nodes can be found in Reisinger et al. ( 1995 ) . Each node consists of two
independent processing units: the application unit and the communication unit. Each
unit is based on a 68070 CPU, featuring a memory management unit (MMU). The
application unit also contains a dynamic RAM, and two bidirectional FIFOs, one of
which serves as an interface to external add-on hardware, the other one connecting
the application unit to the communication unit. Additional hardware for the com-
munication unit comprises a static RAM, two Ethernet controllers (LANCEs), each
3 These efforts include also the IFIP WG 10.4 SIG on Dependability Benchmarking (http://
www.dependability.org/wg10.4/SIGDeB) and the European Project on Dependability Benchmark-
ing - DBench - Project IST 2000-25425 (http://www.laas.fr/dbench).
4 MARS (MAintainable Real-time System) is the distributed system developed at Vienna Univ. of
Technology, that has evolved to the TTA and TTP concepts ( Kopetz and Bauer 2003 ) .
5 Fail silence is intended to describe the behavior of a computer that fails “cleanly” by just stopping
to send messages in case a failure occurs ( Powell , 1994 ) .
 
Search WWH ::




Custom Search