Information Technology Reference
In-Depth Information
identifying how the patients' different genes are related to their characteristics. The
set of characteristics, or phenomena, for an individual is called a phenotype. Linking
genotypes and phenotypes has signifi cantly accelerated genetic discovery. At fi rst,
the phenotypes were collected like other research studies when the biosamples were
collected: fi rst obtaining consent from the patient, then asking a series of questions
to defi ne the phenotype, collecting the sample, and matching it to the phenotype.
But this approach was both slow and zexpensive, especially since a large number of
subjects need to be included for genotype-phenotype studies. Innovations in both
genetic sequencing and consent processes have increased the genotype collection.
For phenotypes, the greatest innovation has been to extract the information from
data already collected as part of the clinical care process - from electronic health
records (EHRs).
The ability to leverage electronic information in health records for genetic
research is an excellent example of translational informatics and its infl uence in the
future of biomedical research and healthcare. In this chapter, we discuss the impor-
tance of EHRs for creating phenotypes and how they can be used. First we describe
a brief history of its use, and recent infl uences that are affecting its current interest.
We review examples of projects that are successfully leveraging EHRs for pheno-
typing, identifying both their successes and challenges. Finally, we discuss the
future of phenotype extraction from EHRs, and the impact on genetic research as
well as other health care research domains.
4.2
History of Secondary Use of Electronic Health Data
The idea of using EHR data beyond clinical care is not new. Health care is an
information- intensive fi eld, and generates large amounts of information. For
decades, researchers have been recommending their use in research studies. Many
of the Patient Outcomes Research Teams created by the Agency for Health Care
Policy and Research specifi cally used data from medical records, and identifi ed the
importance of using medical records rather than claims and billing information [ 2 ,
3 ]. Early informatics researchers were successful in both demonstrating that EHR
data can be valuable to research and identifying the challenges inherent in using it
[ 4 - 7 ]. Even when very few health care institutions were collecting electronic health
data, the use of electronic health data for secondary use in research was pursued.
This interest only increased as data mining and data warehousing in medical data-
bases grew in the late 1990s. The increased use in the 2000s of electronic research
databases and the desire to both populate these systems with existing clinical data
and facilitate cohort identifi cation and selection [ 8 , 9 ] demonstrated secondary data
use as a core part of the emerging discipline of translational informatics.
Perhaps nothing has been as signifi cant in increasing the need for EHR data
for research as genomics in the last few years [ 10 ]. A main reason was the emerg-
ing need for genome-wide association studies (GWAS). Initial genetic discoveries
were from family-based studies. Researchers were studying diseases with a strong
Search WWH ::




Custom Search