SOFTWARE FAILURE MODE AND EFFECT ANALYSIS (SFMEA) - Software Design for Six-Sigma: A Roadmap for Excellence

Information Technology Reference

In-Depth Information

in terms of what is included and excluded. In SFMEA, for example, potential

failure modes may include the delivery of “No” FR delivered, partial and

degraded FR delivery over time, intermittent FR delivery, and unintended FR

(not intended in the mapping).

2. Identify potential failure modes: Failure modes indicate the loss of at least one

software FR. The DFSS team should identify all potential failure modes by

asking “in what way does the software fail to deliver its FRs?” as identified in

the mapping. A potential failure mode can be a cause or an effect in a higher

level subsystem, causing failure in its FRs. A failure mode may occur, but it

must not necessarily occur. Potential failure modes may be studied from the

baseline of past and current data, tests, and current baseline FMEAs.

For the software components, such information does not exist, and failure

modes are unknown (if a failure mode would be known, then it would be

corrected). Therefore, the definition of failure modes is one of the hardest

parts of the FMEA of a software-based system (Haapanen et al., 2000). The

analysts have to apply their own knowledge about the software and postulate

the relevant failure modes. Reifer (1979) suggested failure modes in major

categories such as computational, logic, data I/O, data handling, interface, data

definition, and database. Ristord and Esmenjaud (2001) proposed five general

purpose failure modes at a processing unit level: 1) the operating system stops,

2) the program stops with a clear message, 3) the program stops without a clear

message, 4) the program runs, producing obviously wrong results, and 5) the

program runs, producing apparently correct but, in fact, wrong results. Lutz and

Woodhouse (1999) divide the failure modes concerning either the data or the

processing of data. For each input and each output of the software component,

they considered four major failure modes classification: 1) missing data (e.g.,

lost message or data loss resulting from hardware failure), 2) incorrect data

(e.g., inaccurate or spurious data), 3) timing of data (e.g., obsolete data or data

arrives too soon for processing), and 4) extra data (e.g., data redundancy or

overflow). For step in processing, they consider of the following four failure

modes: 1) halt/abnormal termination (e.g., hung or deadlocked, at this point),

2) omitted event (e.g., event does not take place, but execution continues), 3)

incorrect logic (e.g., preconditions are inaccurate; event does not implement

intent), and 4) timing/order (e.g., event occurs in wrong order; event occurs

too early or too late). Becker and Flick (1996) give the following classes of

failure modes: 1) hardware or software stop, 2) hardware or software crash,

3) hardware or software hang, 4) slow response, 5) startup failure, 6) faulty

message, 7) checkpoint file failure, 8) internal capacity exceeded, and 9) loss

of service. They also listed a detection method based on Haapanen et al.

(2002):

A task heartbeat monitor is coordination software that detects a missed

function task heartbeat

A message sequence manager checks the sequence numbers for messages to

flag messages that are not in order

Software Design for Six-Sigma: A Roadmap for Excellence

Search WWH ::

Custom Search

Home