Physical Fault Models and Fault Tolerance - Models in Hardware Testing

Hardware Reference

In-Depth Information

End-to-End EDMs These mechanisms include end-to-end checksums for message

data and multiple (basically, double) executions of tasks.

The end-to-end checksums are used to detect the mutilation of message data

exchanged between two nodes of an FTU and are therefore used by the receiving

task, for extending the fail silence property of the MARS nodes.

Double execution of tasks in time redundancy can detect errors caused by

transient faults that cause different output data of the two instances of the task.

Combined with the concept of message checksums, task execution in time redun-

dancy forms the highest level in the hierarchy of the error detection mechanisms.

These mechanisms also trigger the execution of a trap instruction, which causes a

reset of the node.

8.3.4.2

The Experimental Framework

The testbed that has supported the fault injection experiments at each site features

five MARS nodes (Fig. 8.16 ) . The node under test (NUT, for short) is the node

subject to the injection of a fault during each experiment run.

Another node ( golden node) serves as a reference and a third node ( comparator

node) is used to compare the messages sent by the two previous nodes. When a

discrepancy is observed by the comparator node (fail silence violation) or the NUT

detects an error, the NUT is declared to be failed and then shut down by the com-

parator node to clear all error conditions for the subsequent experiment run. After

some time, power is reinstalled and the NUT is reloaded for the next run. The data

generation node simulates the data corresponding to the real-time application that

is being used to activate the NUT and the golden node during each fault injection

experiment.

Fig. 8.16

The testbed architecture featuring five MARS nodes

Search WWH ::

Custom Search

Home