Biomedical Engineering Reference
In-Depth Information
In addition to cost, the major issues in the data-creation phase of the data life cycle include tool
selection, data format, standards, version control, error rate, precision, and accuracy. These metrics
apply equally to clinical and genomic studies. In particular, metrics such as error rate, precision, and
accuracy are more easily ascribed to machine-generated data, whether from clinical laboratory
studies or microarray analysis. For example, optical character recognition (OCR), which was once
used extensively as a means of acquiring sequence information from print publications, has an error
rate of about two characters per hundred, which is generally unacceptable.
Subjective information created by hands-on clinical analysis and entered into the computer system
through the use of manual transcription, voice recognition data-input systems, or desktop or
handheld computers, is much more difficult to validate. What's more, there is significant variation in
subjective interpretation of clinical studies. For example, five seasoned radiologists will typically
provide five different interpretations of the same chest film or other radiographic study. In addition to
the quality of the initial clinical observation, there are errors introduced by the hardware, software,
and processes involved in capturing data, from keyboard and mouse to optical character recognition,
and voice recognition.
The creation and acquisition of patient data raises several ownership and privacy concerns. One of
the greatest challenges regarding acquisition of clinical data is the Health Insurance Portability and
Accountability Act (HIPAA), which mandates security and privacy of patient data. The act requires all
health plans, clearinghouses, and providers of healthcare services to adopt national standards for
electronic transactions and information security by mid-2004. Technologies that support user
authentication, from password-protection schemes to biometric security technologies and data
encryption are key to ensuring compliance with the act. Although there is not yet a parallel guideline
for genomic data, it is likely that legislation in this area will materialize as soon as public awareness
of the privacy issues becomes widely apparent.
Use
Once clinical and genomic data are captured, they can be put to a variety of immediate uses, from
simulation, statistical analysis, and visualization to communications. Issues at this stage of the data
life cycle include intellectual property rights, privacy, and distribution. For example, unless patients
have expressly given permission to have their names used, microarray data should be identified by
ID number through a system that maintains the anonymity of the donor.
Data Modification
Data are rarely used in their raw form, without some amount of formatting or editing. In addition,
data are seldom used only for their originally intended purpose, in part because future uses are
difficult to predict. For example, microarray data may not be captured expressly for comparison with
clinical pathology data, but it may serve that purpose well. The data dictionary is one means of
modifying data in a controlled way that ensures standards are followed. A data dictionary can be
used to tag all microarray data with time and date information in a standard format so that they can
be automatically correlated with clinical findings (see Figure 2-9 ).
Figure 2-9. Data Dictionary-Directed Data Modification. The time and date
header for microarray data can be automatically modified so that it can be
easily correlated with clinical findings.
Search WWH ::




Custom Search