Biology Reference
In-Depth Information
both versions of the protein must fold correctly, or at least function in
the same way. By examining many such closely related proteins, Gra-
ham can build up a statistical theory of what kinds of changes can be
made to a protein without causing it to fail to fold. However, each pair
of proteins can contribute only a tiny amount of information about
possible nucleotide substitutions, and therefore many protein pairs are
required for this kind of analysis. As such, much of Graham's work is in
constant need of more sequence data:
The strategy I'm trying to adopt is: let's take some of the classic
tests and measures [for the adaptation and evolution of proteins]
and recast them in ways where, if we had all the data in the
world, then the result would be clearly interpretable. The only
problem that we have is not having all the data in the world.
Graham's work proceeds as if “not having all the data in the world” is
a problem that will soon be solved for all practical purposes. Soon after
the interview quoted above, the Broad Institute published the genomes
of twelve different species of fruit fl ies, providing a gold mine of data
for Graham's work. 14
The growth of sequence data has become so rapid that biologists
commonly compare it to Moore's law, which describes the exponential
growth of computing power (in 1965, Moore predicted a doubling ev-
ery two years). In 2012, the quantity of sequence was doubling roughly
every six months. Some argue that, in the long run, sequence data may
outstrip the ability of information technology to effectively store, pro-
cess, or analyze them. 15 However, this argument overlooks the fact that
growth in computing power (processing power and storage) may drive
the growth in biological data—it is at least in part because of Moore's
law that biologists can imagine “having all the data in the world” as
a realizable goal. There is a feedback loop between Moore's law and
biological work that drives biology toward an ever-greater hunger for
data and the ability to process them instantly. Over the duration of my
fi eldwork, the greatest excitement in the fi eld (and probably in all of bi-
ology) came not from a new discovery in evolution or genetics or physi-
ology, but rather from a new set of technologies. The buzz about the
so-called next-generation (or next-gen) sequencing technologies domi-
nated the headlines of journals and the chatter in the lunchroom. 16 In
countless presentations, the audience was shown charts illustrating the
growth in sequence data over time or (conversely) the reduction in cost
per base pair produced (fi gure 2.2). Biologists clamored for these new
Search WWH ::




Custom Search