Biology Reference
In-Depth Information
biological sciences and constructing predictive models
from them will require approaches more akin to those
employed by physicists, climatologists, and other strongly
quantitative disciplines that have mastered the collection
and predictive modeling of high-dimensional data.
single data dimensions, whether constructing co-expression
networks based on gene expression data, carrying out
genome-wide association analyses based on DNA variation
information, or constructing protein interaction networks
based on protein
protein interaction data. Although we
achieve some understanding in this way, progress is limited
because none of the dimensions on their own provide
a complete enough context within which to interpret results
fully. This type of limitation has become apparent in
genome-wide association studies (GWAS) or whole-exome
or genome sequencing studies, where thousands of highly
replicated loci have been identified and highly replicated as
associated with disease, but our understanding of disease is
still limited because the genetic loci do not necessarily
inform on the gene affected, on how gene function is
altered, or more generally, how the biological processes
involving a given gene are altered at particular points of
time or in particular contexts [3
e
A MOVIE ANALOGY FOR MODELING
BIOLOGICAL SYSTEMS
The full suite of interacting parts in living systems, from the
molecular to the ecological level, if they could be viewed
collectively over time, would enable us to achieve a more
complete understanding of cellular, organ, and organism-
level processes, in much the same way as we achieve
understanding by watching a movie. The continuous flow
of information in a movie enables our minds to exercise an
array of priors that provide the appropriate context and
which constrain the possible relationships (structures) not
only within a given frame or scene, but over the entire
course of the movie. As our senses take in all of the
streaming audio and visual information, our internal
network reconstruction engine (centered at the brain)
pieces the information together to represent highly complex
and non-linear relationships depicted in the movie, so that
in the end we are able to achieve an understanding of what
the movie intends to convey at a hierarchy of levels.
If, instead of viewing a movie as a continuous stream of
frames of coherent pixels and sound, we viewed single
dimensions of the information independently, under-
standing would be difficult if not impossible to achieve.
For example, consider a 1.5-hour feature-length film com
prising 162 000 frames (30 frames per second), where each
frame consists of 1280 720 (roughly one million) pixels.
One way to view the film would be as a single frame in
which the intensity value for each pixel across all 162 000
frames was averaged. This gross aggregate average would
provide very little, if any, information regarding the movie,
not unlike our attempts to understand complex living
systems by examining single snapshots of a subset of
molecular traits in a single cell type and in a single context
at a single point in time. If we viewed our movie as inde-
pendent one-dimensional slices through its frames, where
each slice was viewed as pixel intensities across that one
dimension changing over time (like a dynamic mass spec
trace), this view would provide significantly more infor-
mation, but it would still be very difficult to understand the
meaning of the movie by looking at all of the one-dimen-
sional traces independently, unless more sophisticated
mathematical algorithms were employed to link the infor-
mation together.
Despite the complexity of biological systems, even at
the cellular level, research in the context of large-scale
high-dimensional 'omics' data has tended to focus on
5,14] . It is apparent that if
different biological data dimensions could be formally
considered simultaneously, we would achieve a more
complete understanding of biological systems [3,4,15
e
17] .
(See the documentary film'TheNewBiology' at http: //www.
youtube.com/watch?v ΒΌ sjTQD6E3lH4 .)
To obtain a more complete understanding of biological
systems, we must not only evolve technologies to sample
systems at ever higher rates and with ever greater breadth,
we must also innovate methods that consider many
different dimensions of information to produce more
descriptive models (movies) of the system. There are of
course many different types of modeling approaches that
have been and continue to be explored. Descriptive models
quantify relationships among variables in data that can in
turn enable classification of systems under study into
different meaningful groups, whether stratifying disease
populations into disease subtypes to assign patients to the
most appropriate treatment, or categorizing customers by
product preference, descriptive models are useful for
classifying, but cannot necessarily be used to predict how
any given variable will respond to another at the individual
level. For example, whereas patterns of gene expression
such as those identified for breast cancer and now in play at
companies such as Genomic Health, can very well distin-
guish good from poor prognoses [18,19] , such models are
not generally as useful for understanding how genes in such
patterns are causally related, or for distinguishing key
driver genes from passenger genes.
Predictive models, on the other hand, incorporate
historic and current data to predict how one variable may
respond to another in a particular context, or predict
response or future states of components of a system at the
individual level. In the biological context, predictive
models aim to accurately predict (in silico) molecule
expression-level changes, cell state dynamics, and pheno-
type transitions in response to specific perturbation events.
e
Search WWH ::




Custom Search