Biomedical Engineering Reference
In-Depth Information
facility. A related tactic is to use network-based storage from a third-party vendor and export data to
the offsite storage electronically. However, third-party archives have greater security risks than
archives that can be controlled and maintained locally.
Repurposing
One of the major benefits of having data readily available in an archive is the ability to repurpose it
for a variety of uses. For example, linear sequence data originally captured to discover new genes are
commonly repurposed to support the 3D visualization of protein structures.
One of the major issues in repurposing data is the ability to efficiently locate data in archives. The
difficulty in locating data once it's been incorporated into a storage system depends on the volume of
data involved. Efficient retrieval is a function of the hardware and database management software,
the effectiveness of the user interface, and the granularity of the index. For example, nucleotide
sequence data indexed by chromosome number would be virtually impossible to locate if the
database contains thousands of sequences indexed to each chromosome.
Issues in the repurposing phase of the data life cycle include the sensitivity, specificity, false
positives, and false negatives associated with searches. The usability of the user interface is also a
factor, whether free-text natural language, search by example, or simple keyword searching is
supported. In addition, the provisions for security can affect the ease with which data can be located
and repurposed. An overly complex security procedure that requires revalidation of user identity
every five minutes could deter even the most well-intentioned researcher.
Disposal
The duration of the data life cycle is a function of the perceived value of the data, the effectiveness of
the underlying process, and the limitations imposed by the hardware, software, and environmental
infrastructure. Eventually, all data die, either because they are intentionally disposed of when their
value has decreased to the point that it is less than the cost of maintaining it, or because of
accidental loss. Often, data have to be archived because of legal reasons, even though the data is of
no intrinsic value to the institution or researcher. For example, most official hospital or clinic patient
records must be maintained for the life of the patient. As such, earmarking data for disposal is
normally based on the quality and relevance of the data, as opposed to the age of the data.
Researchers in a laboratory working with sequence data might be investigating single genes in turn,
moving from one gene to the next. When sequence data from one gene is no longer necessary, it can
be discarded from the local data warehouse leaving room for the next gene's sequence
data—whether the data are stored on an internal disk in a Linux workstation or a central data
warehouse.
Managing the Life Cycle
Managing the data life cycle is an engineering exercise that's a compromise between speed,
completeness, longevity, cost, usability, and security. For example, the media selected for archiving
will not only affect the cost, but the speed of storage and longevity of the data. Similarly, using an in-
house tape backup facility may be more costly than outsourcing the task to networked vendor, but
the in-house approach is likely to be more secure. These tradeoffs are reflected in the
implementation of the overall data-management process.
Search WWH ::




Custom Search