Biology Reference
In-Depth Information
cases (those conserved across species); only a few of these cases were
chosen as representative samples to be verifi ed by RT-PCR. The biolo-
gist's work is still oriented toward the lab bench and the specifi city of
particular samples. The computer scientist, on the other hand, is aiming
to digest more and more data at higher and higher speeds. This is not a
confl ict about whether or not to use computers, but rather a battle over
different ways of working and knowing in biology.
Mass and Speed
The examples given in the previous section might be placed along a
spectrum: more traditional biological work lies at one end, while newer
computational approaches lie at the other. One way of placing work
along such a spectrum would be to characterize it in terms of mass and
speed . Whether we consider a high-throughput sequencing machine, an
automated laboratory, or a high-level meeting about research grants,
bioinformatic biology is obsessed with producing more data, more rap-
idly. It possesses an insatiable hunger for data. Its activities are oriented
toward quantity . Data production, processing, analysis, and integration
proceed more and more rapidly: the faster data can be integrated into
the corpus of biological information, the faster more data can be pro-
duced. Data production and data analysis are bound into a feedback
loop that drives a spiraling acceleration of both.
In a conference held at the NCBI for the twenty-fi fth anniversary of
GenBank, Christian Burks (a group leader at Los Alamos when Gen-
Bank was started) asked, “How much sequencing is enough?” Speaking
only slightly tongue in cheek, Burks began to calculate how many base
pairs existed in the universe and how long, using current technology, it
would take to sequence them. Although this, he admitted, represented
a ridiculous extreme, Burks then computed how long it would take to
sequence one representative of every living species, and then predicted
that we would eventually want about one hundred times this amount of
sequence data—the equivalent of one hundred individuals of each liv-
ing species. 13 No one in the audience seemed particularly surprised. The
ability and the need to sequence more, faster, had become obvious to at
least this community of biologists.
Another example of this voracious need for data is provided by the
work of Graham. Graham's work uses genomic and other data to try
to understand how and why errors occur in protein folding. Closely re-
lated species usually possess closely related proteins, perhaps with only
a few nucleotide differences between them. Yet, since both species exist,
Search WWH ::




Custom Search