Biology Reference
In-Depth Information
the computer has imported new techniques into biology. These tech-
niques have, in turn, engendered new approaches directed toward an-
swering general or high-level questions about biological systems.
Bioinformatics at Work
But this discussion of the scope of bioinformatics fails to provide any
account of how bioinformatics does its work. How are the data actu-
ally used to answer these big questions? What do computers actually do
to reduce vast amounts of data to meaningful knowledge? Answering
these questions requires a closer look at what is going on inside a com-
putational biology lab. Following the work of the Burge lab in detail
demonstrates the importance of statistical approaches in this work. The
lab used computers not only to analyze data, but also in specifi c ways as
tools for statistics and simulation.
During my fi eldwork, the Burge lab began collaborating with Illu-
mina Inc., the manufacturer of the brand-new (in 2007) next-generation
Solexa high-throughput sequencing machines. Illumina hoped to dem-
onstrate the usefulness of its machines by providing data to academic
biologists who would analyze those data and publish their fi ndings. The
Burge lab was happy to accept a large volume of new data. The Solexa
machines were able to produce an unprecedented amount of sequence
data in a remarkably short time. At a practical level, the collaboration
with Illumina had provided the Burge lab with a data set of such size
and richness that the team felt they did not have the resources to analyze
it completely. This data glut presented a fortunate circumstance for my
fi eldwork: plenty of interesting projects and problems were available on
which I could cut my teeth.
The data themselves were gene expression data from four distinct tis-
sues in the human body (heart, liver, brain, skeletal muscle). That is, they
consisted of millions of short sequences of mRNA. DNA is transcribed
into mRNA (messenger RNA) before being translated into proteins, so
a collection of mRNA sequences provides a snapshot of the genes that
are being expressed (that is, made into proteins) in particular cells at a
particular time. Before Solexa technology, amassing an equivalent set of
expression data would have been a signifi cant experimental program in
its own right. 25 Solexa machines produced the data automatically from
a few runs of the machine over a few days.
In all higher organisms, the mRNA transcript is spliced: introns are
cut out and discarded, and exons are stitched together by the cellular
machinery. Since the Solexa sequencing took place post-splicing, the se-
Search WWH ::




Custom Search