Physical Fault Models and Fault Tolerance - Models in Hardware Testing

Hardware Reference

In-Depth Information

8.3.2.4

Software-Implemented Fault Injection

For these experiments, the compile-time version of SWIFI was selected: faults were

injected at the machine code level and the mutilated application (code segment or

data segment) was loaded to the target system afterwards. Two main reasons led us

to select such an approach ( Fuchs 1996 ):

1. The intrusiveness is reduced to a minimum, since faults are injected only into the

application software (no additional code, which could probably interfere with the

behavior of the application software, is needed).

2. Fault injection at the machine code level is capable of injecting faults that cannot

be injected at higher levels by using source code mutations.

The SWIFI experiments started at the Vienna University of Technology, Austria

and continued at the Research and Technology Institute of Daimler Benz AG (then

DaimlerChrysler) in Berlin, Germany.

Both the code and data segments of the application software used as the workload

for the experiments were targeted by the SWIFI technique. Within each segment, the

bit to be faulted was selected randomly to achieve a uniform distribution over the

whole segment. To facilitate the comparison with the HWIFI techniques, we only

consider here the single bit-flip experiments, because they constitute a reasonable

fault scenario for the comparison with these techniques (e.g., heavy-ion radiation

generates, to a large extent, single bit-flips).

8.3.3

Representativeness with Respect to the F Set

In this section, we describe a general framework ( Arlat and Crouzet 2002 )thatis

meant to help address comprehensively the representativeness issue.

From a pragmatic viewpoint, the main objective is to identify the technology

that is both necessary and sufficient to generate the F set to conduct a fault in-

jection test sequence. Several important issues have to be accounted for in this

effort.

8.3.3.1

System Levels and Fault Pathology

AsshowninFig. 8.14 , several relevant levels of a computer system can be identified

where faults can occur and errors can be identified (e.g., physical-device, logic,

RTL, algorithmic, kernel, middleware, application, operation). Concerning faults,

these levels may correspond to levels where real faults are considered and (artificial)

faults can be injected. Concerning errors, the FTMs (especially, the error detection

mechanisms, EDMs) provide convenient built-in monitors.

Search WWH ::

Custom Search

Home