Biology Reference
In-Depth Information
As indicated in the introduction to this section, many
additional technologies are emerging that will open up
the exploration of new dimensions of patient data
space.
5. The 'data explosion' requires that new analytic tools be
created for capturing, validating, storing, mining, inte-
grating and finally modeling all of these biological data
sets, thus helping to convert them into knowledge. A
critical point is that these software solutions must be
driven by the needs of
Realizing the power of high-dimensional diagnos-
tics requires overcoming very significant data
analysis challenges. This exciting future will only be
realized as we address very significant computational
and data analysis challenges. The human body is an
enormously complex dynamic system interacting with
an ever-changing and diverse environment. As the
capacity to make molecular measurements continues to
increase in scope and precision, the challenge of finding
the relevant signals amidst the sea of observations can
be daunting. As with any complex system, causality is
often difficult to find and there are many ways that
systems can break down and result in disease. Our
bodies have enormously intricate and beautiful
approaches for dealing with disease, for example via
the immune system, and thus the residual medical
problems we must solve must consider the conse-
quences of the highly adaptive protective immune
responses: both those that have been successful and
those that were unsuccessful (such as the failure to
check malignant cancers). Thus, these problems are
often highly challenging, including from an informa-
tional point of view.
The primary challenge of big data in biology is to
separate relevant signal from noise, including both
technical noise (from measurements) and biological
noise (from other biological factors besides those of
interest). The number of measurements that come from
the quantified self present significant hazards for proper
interpretation. Having increasing ability to make
precise measurements is exciting because so much new
information is available, but care must be taken not to
build overly complex models that appear very good on
initial data assessment, but which fail when moved
forward towards potential clinical use. Using an overly
complexmodelthatfitsthealready observed data really
well, but then does not maintain accuracy when applied
to new data for the same phenomenon (i.e., the model is
fitting noise rather than the true underlying relation-
ship), is called overfitting. In biological and clinical
studies with 'omics' data, we are typically in what
statisticians refer to as the small samples size regime.
That is, we have very many more variables than we do
observations. For example, there are tens of thousands
of different transcripts measured in a human tran-
scriptome (the variables), but generally only of the
order of 100 or so samples (the observations) in a given
study. This is exactly the opposite situation of what is
desiredtoreliablyusemeasurements to distinguish
classes and establish reliable relationships among the
variables (e.g., transcripts): one would like to have very
many observations relative to the number of variables
that are being used. Because the number of variables is
leading-edge biology and
medicine
and by biological domain expertise. One
big revolution in medicine is that we will create massive
amounts of digital data for the 'quantified self' of each
individual that will transform our ability to monitor and
optimize our own wellness. The following sections
discuss in detail the transformation of big data sets to
medically relevant information
e
Computational integration of 'quantified self' data
will revolutionize health. Information on the quanti-
fied self provides enormous potential for the future of
P4 medicine, as we are able to harness this information
productively through powerful data analysis and large-
scale computation ( Figure 23.9 ). The key issue is how
such large repositories of data can be turned into
actionable knowledge. The potential of this endeavor is
enormous, as we will gain unprecedented detail about
how our bodies work, what brings about disease, and
how wellness can be maintained. Interpreting multi-
faceted biological data deeply for each individual e and
integrating it broadly across populations
will open
new vistas of biological knowledge and clinical power.
The pace of the technological changes will be quick,
driven by exponentially rising computational power to
take advantage of the exponentially rising amounts of
high-throughput biological data. This is P4 medicine's
heritage from the digital revolution.
Four factors will be important for dealing with the
striking signal-to-noise issues of large data sets: (1) the
integration of similar data types from different labo-
ratories to enormously enlarge the data sets analyzed
(see below); (2) the integration of data of different
types
e
including molecular, cellular, conventional
medical and phenotypic data; (3) the transformation of
these data into the 'network of networks' for each
individual patient
e
and following the dynamics of the
'network of networks'; and (4) the use of subtractive
biological analyses to eliminate various forms of bio-
logical noise, as described in the prion discussion (see
above). Each of these approaches represents signifi-
cant computational/mathematical, technical and bio-
logical challenges, some of which are illustrated in the
following discussion.
e
Search WWH ::




Custom Search