Biomedical Engineering Reference
In-Depth Information
can be located later. For example, each document can be assigned one or more keywords, but if the
keywords aren't appropriate, the keyword vocabulary is undefined or not enforced, or too few
keywords are used, then a document may be effectively lost in the system. Not only the choice and
number of keywords, but the indexing hierarchy can make data hard to find.
The process of data archiving is far more important than the associated technology, in that the best
software and hardware are useless if they aren't used. Of the technical issues involved in archiving
gene sequences, microarray analysis, and other bioinformatics data, scalability is typically the most
important. Even relatively small laboratories generate megabytes of data every week, which is
fueling demand for very-large-capacity archival storage devices.
One of the primary determinants of archive capacity is the storage media—the physical material used
to form a tape, disk, or cartridge. In addition to capacity, media can be characterized in terms of
compatibility, speed, data density, cost, volatility, durability, and stability. Compatibility is the ability
of media to function within a particular software and hardware environment. Speed is a multi-faceted
performance characteristic that encompasses both the time to locate data (seek time) and the time
to write it to or download it from the media (data transfer rate), all of which are functions of the
construction of the supporting hardware and electronics. Seek time may be several hundred
milliseconds for a CD-ROM, a few milliseconds for a hard drive, and a few microseconds for a flash
memory card. Capacity—the maximum amount of data the media can store—is a function of the
media construction, the tolerance of the casing or cartridge for tape- and disk-based media, and the
technology used to read and write the data. Capacity is also a function of data density, which is in
turn a function of the media, the drive mechanism, and the error coding and compression
technologies. Error-control and compression schemes in hard drives and other media allow higher
data densities than the raw media would support otherwise.
Cost is a function of the raw materials involved in the creation of media, but has more to do with
what the market will bear and what the competition has to offer. Volatility, a characteristic normally
ascribed to solid-state memory, refers to the status of the data when external power is removed.
Flash memory, like magnetic disk or tape, is considered relatively non-volatile, and can hold data for
years without loss.
Durability refers to the physical properties of the media that contribute to the longevity of the
surface, mechanisms, and housing, if any, during normal use. For example, the bearings and other
components in the rotational system of a hard drive undergo wear and tear over time. Stability
reflects the physical properties of the media in a given environment that contribute to the longevity
of the media and therefore the data, in a dormant state. For example, the bearings, metal, and
plastic parts are subject to the same problems that beset every complex electro-mechanical device.
Lubrication dries out, leaving bearings dry and without protection, rubber becomes brittle, plastic
parts deform, and dust and lint accumulate in the cooling system. Furthermore, the magnetic
patterns induced in the iron-oxide coating on the disk platters fade over the years, especially in the
heat. Similarly, the plastic-based optical media of a CD-ROM is susceptible to damage from high
humidity, rapid and extreme temperature fluctuations, and contamination from airborne pollution.
Over time, oil from our fingers can also damage the plastic surface of a CD-ROM. Fluctuations in
temperature and humidity can also cause shrinking and expansion of magnetic tape, distorting the
position of data tracks, resulting in data loss.
The longevity or life expectancy of the devices in an archive system is usually expressed in the Mean
Time Between Failure (MTBF) rating. The MTBF, an estimate of the failure rate of a device during its
expected lifetime, is one metric that can be used to estimate the life span of an archive. Typical MTBF
ratings for tape drives and commercial-grade hard drives are over 20 years. However, this figure
assumes ideal conditions of constant low temperature and humidity, freedom from biological agents,
static-electricity discharges, and mechanical abuse. Another consideration is that even if a tape
survives a decade or more in fireproof safe, it's likely that the data it contains will be inaccessible
because of changes in tape-drive standards. Most of the disk packs, tapes, and magnetic cartridges
that were standard archival media a decade ago are incompatible with current computer hardware.
Archives vary considerably in configuration and in proximity to the source data. For example, servers
typically employ several independent hard drives configured as a Redundant Array of Independent
Search WWH ::




Custom Search