Information Technology Reference
In-Depth Information
This is a crucial point of failure because if a host bus adapter fails during important I/O
operations, the data could be lost. Unfortunately, there is no exact way of telling when it
fails because most electronic devices will just fail without warning. But some would exhibit
signs such as intermittent disconnection with attached storage devices or I/O operations
getting dropped or taking longer than normal.
Memory Failure Main memory is one of the core components of a computer system.
Everyone in IT knows that RAM performance is key to system performance and that mem-
ory failure is not an option, at all. Disks can be configured for backup and redundancy, but
there are no such options for memory. A memory failure can cause an entire system to crash
because the memory module that failed may contain important data that is being used by the
system or its components. A computer will not even start when there is a defective memory
module attached to it. So it is imperative to always check a system's memory, and signs of fail-
ure must detected ahead of time to prevent costly and untimely downtime.
NIC Failure The network interface card (NIC) is a computer system's gateway to the net-
work and beyond. It is the main communication interface and important for a distributed
system that is supposed to be accessible from anywhere in the world. However, it is also
fault tolerant. Losing a NIC would mean losing connectivity, but that does not involve sys-
tem failure. There would be network downtime and the server might not be accessible, but
it is an easily containable and preventable failure through the use of NIC teaming/bonding
or link aggregation. It is certainly not as fatal as memory and CPU failure.
CPU Failure The central processing unit (CPU) is the brain of the computer, hence the
word central . CPU failure would mean utter and total failure. A CPU failure is one of the
worst kinds of failure, in terms of cost and lost productivity, that can occur in your system.
It ensures total shutdown of the system, and most operations will be nonrecoverable. The
CPU is also one of the most expensive parts of the system and one of the hardest to replace
in terms of installation. Unlike a NIC, HBA, or disk, which can all simply be plugged into
the board or various sockets, the CPU must be completely removed and then replaced.
Summary
This chapter is all about performance of the infrastructure rather than the virtualized
environment of cloud computing. We focused more on the concepts of how most of the
hardware parts can perform and fail. The most prominent of these parts is the disk drive,
which is incidentally also the largest bottleneck of the system. The speed and performance
of the disk drive has hardly improved since the 1990s, but the capacity and affordability
of the technology has improved by leaps and bounds. So it is this relatively weak perfor-
mance that we examined. For a disk, its key performance indicators are its access time
and the data transfer rate. The access time is the time it takes for the mechanical parts to
position the read/write head on top of the track and sector that contains the data it is look-
ing for. Taken into account are the spindle speed, which rotates the disk and is measured
Search WWH ::




Custom Search