Information Technology Reference
In-Depth Information
in specified environments and with a desired confidence, can
be specified, designed in, predicted, tested and demonstrated.”
In material product engineering, reliability is extremely important. Manu-
facturers want to reduce their costs as much as possible, sometimes taking
the route of using inferior quality raw materials, or not adhering to
approved production standards. This may not be the right strategy because
it can increase the cost of supporting the product once it gets out in the
field.
Although the above may have been used in the context of purely
mechanical systems, it also has relevance to the software industry. Conscious
efforts must be made to weave reliability through the life cycle of the
development by incorporating reliability models in one's software right from
the product planning stage. The architecture and design (scalability, per-
formance, code reuse, third-party components, etc.) set the stage for the
reliability of the product. Further along, coding styles, adherence to good
software engineering practices, and adequate developer testing help in
creating reliable code. Subsequent testing, using meaningful in-house data
and field data (if the customer is using a system already), helps in ascer-
taining how reliably the software will work in a production environment.
An analysis of bugs discovered in the software, which requires proper
documentation of all bugs found in-house or at the customer site, irre-
spective of their severity (see Chapter 12 on quality), is another valuable
tool. Reliability engineering is known to use various distribution models
— Weibull being the most widely used — to get a good handle on the
success rates of manufactured products, and ensure that they balance the
business goals of the customer.
Accountability for Failure
Systems fail due to human error. Typical errors are often the result of
incomplete information to handle the system, or the failure to follow a
long set of procedures while using a system. Failures can also occur when
accountability has not been properly established in the operational hier-
archy of the system. If users or operators are not fully aware of their roles
and responsibilities (especially those who are monitoring the behavior of
the system), they tend to assume that reporting a potential problem may
not be
job. They assume that the other guy or the manager will
ultimately spot it. Critical problems can easily be missed this way. The
following is a good checklist to ensure that no part of the system is passed
over:
their
Search WWH ::




Custom Search