Database Reference
In-Depth Information
During decades of research, many investigations on issues of disk reliability have been
conducted.
2.1.1.1 Disk failure modes
There are several kinds of disk failure modes. In general, these disk failure modes
can be categorized into two categories: partial disk failure and permanent disk failure.
2.1.1.1.1 Partial disk failures
This is a type of disk failure that only affect part of storage space of the disk while
the rest is still functional. There are only a few existing works that study partial disk
failures. For example, a type of partial disk failure commonly referred to as “bad
sectors” has been relatively well studied since the 1990s. The bad sectors are seen
as inaccessible data blocks or sectors during reading or writing operations. The main
cause is due to wear and tear of platter surface, head crash, manufacturing defects,
and tracking errors. Research on identifying and replacing bad sectors of disks was
conducted [27-29] , and useful tools have also been produced in the industry for a long
time. Detailed investigations on partial disk failures were conducted, where several
fault-tolerance techniques were proposed to proactively guard against permanent data
loss due to partial disk failures [30] . However, research conducted solely on analyz-
ing partial disk failures is rare, as many of the solutions dealing with permanent disk
failures can also be used to recover data from a partially failed disk. For example, the
data replication approach can be applied on a single disk to avoid bad sectors [31] . Re-
dundant array of inexpensive disks (RAID) can also be used to improve the reliability
of data by storing additional parity information on multiple disks, which is generic for
both partial and permanent disk failures [32] .
2.1.1.1.2 Permanent disk failures
The term “permanent disk failure” is used to describe the type of disk failure when
the disk is physically not recoverable and requires replacement [3,17] . The reason for
a permanent disk failure could be complex and hard to identify. Damage in internal
components, such as the printed circuit board, the read-write head, and motor or firm-
ware failure, could all lead to a permanent disk outage. In general, when permanent
disk failures happen, the data stored on the disk is considered to be permanently lost.
Currently, the assumption of permanent disk failure in the disk reliability research
and data reliability research is common [3,4,17,33] . In this topic, the research is con-
ducted based on the permanent disk failure mode. Therefore, we mainly investigate
the existing related work about permanent disk failures in the rest of this section.
2.1.1.2 Disk reliability metrics
In general, there are two metrics that are widely used for describing the permanent
disk failure rates, which are the mean time to failure ( MTTF ) and annualized failure
rate ( AFR ). MTTF is the length of time that a device or other product is expected to
last in operation. It indicates how long the disk can be reasonably expected to work.
In industry, the MTTF of disks are obtained by running many or even many thousands
Search WWH ::




Custom Search