Information Technology Reference
In-Depth Information
As argued in the Wired magazine article “The End of Theory,” we need a change
in our way of thinking about science when faced with very large datasets [ 4 ]. Typical
statistical approaches breakdown with suffi ciently large data and the challenge
becomes to measure and to interpret the data instead of hypothesis-driven experi-
mentation [ 3 , 4 ]. One of the supporting case studies in the Wired magazine article is
a tool that predicts crop yields 1 down to the level of individual fi elds by sifting
through over 50 terabytes of data. In the legal 2 domain, big data analysis is replacing
armies of lawyers with tireless computer algorithms for task of discovery , i.e., pro-
ducing relevant materials for a case by examining millions of emails, memoran-
dums, and other documents to extract relevant information. 3
It is time for biomedicine to embrace the trend because historical barriers to the
adoption of EHR systems are giving way to new Federal incentives, resulting in the
collection of medical data at an unprecedented scale [ 5 ]. The Health Information
Technology for Economic and Clinical Health Act (HITECH) legislation calls for
meaningful use objectives 4 to measure progress, to sustain early adoption, and to
provide accountability. In parallel, new kinds of datasets, such as next-generation
sequencing datasets and personalized omic profi les [ 6 , 7 ], are creating large amounts
of data on individual patients that cannot be analyzed or reviewed by a doctor in a
15 min offi ce visit. New approaches to capture, store, analyze and interpret such
massive datasets are urgently needed.
7.2
The Kinds of Big Data in Medicine and Analyses
They Enable
The discussion of Big Data in translational informatics frequently connotes next-
generation sequencing data [ 8 - 10 ]. However, this is beginning to change: the use of
large datasets of various kinds increased dramatically in recent years. 'Big Data' is
an increasingly comprehensive term, including both large amounts of molecular
measurements on a person (e.g., next-generation sequencing) as well as small
amounts of routine measurements on a large number of people (e.g., clinical notes,
lab measurements, claims data and adverse event reports).
Imagine how scientifi c inquiry, and the ability of our healthcare system to 'learn'
[ 5 ], would be different if we collect and share access to lots of data—both genomic
and “routine.” How will the kinds of questions we ask change when we cross a cer-
tain data-threshold? [ 3 , 11 ]. Outside of healthcare and biomedicine, a small amount
of data about millions of individuals is already being collected and mined by Web
1 http://www.wired.com/science/discoveries/magazine/16-07/pb_feeding .
2 http://www.wired.com/science/discoveries/magazine/16-07/pb_lawsuit .
3 http://www.nytimes.com/2011/03/05/science/05legal.html .
4 http://edocket.access.gpo.gov/2010/pdf/2010-17207.pdf .
Search WWH ::




Custom Search