Biology Reference
In-Depth Information
the sequence itself—becomes a particular sort of space with a particular
structure and with particular features. The computational representa-
tion has thoroughgoing consequences for how biological objects are
understood.
AceDB is a remarkably successful tool. The database quickly spread
from its original use as a worm sequencing tool to other whole-
organism sequencing projects. The Wellcome Trust Sanger Institute and
the Genome Institute at Washington University, both major laboratories
responsible for mapping and sequencing the human genome, adopted
AceDB. More recently, it has also been adapted for uses with nonbiolog-
ical data. 25 By the late 1990s, the amounts of biological (and especially
sequence) data from a number of species, including humans, had grown
so much that it was realized that visual tools such as AceDB were going
to be crucial for understanding and managing this volume of informa-
tion. The problem was that, for biologists, the genome in its raw, textual
form was not usually a very useful thing to work with—who can make
sense of 3 billion As, Gs, Ts, and Cs? Biologists needed ways of viewing
these data, of seeing them in a way that immediately brings their salient
features (such as genes) into view. As Durbin and Thierry-Mieg wrote,
“Clearly what is required is a database system that, in addition to stor-
ing the results of large scale sequencing and mapping projects, allows
all sorts of experimental genetic data to be maintained and linked to the
maps and sequences in as fl exible a way as possible.”
As more and more of the human genome was sequenced, the need
for a powerful system of annotation became ever more pressing. In
1998, Celera Genomics, a private company, challenged the public HGP,
claiming that it could sequence the human genome better and faster.
As the rivalry between the public consortium and Celera intensifi ed,
the project directors realized that the battle would be lost or won not
just though sequencing speed, but also through the presentation and
representation of the genomic data. 26 Durbin put one of his graduate
students to work on the problem. Ewan Birney, along with Tim Hub-
bard and Michele Clamp, generated the code for what became Ensembl,
a “bioinformatics framework to organize biology around the sequences
of large genomes.” 27
Chapter 4 has already described some of my experiences working
with the team responsible for maintaining Ensembl at the European Bio-
informatics Institute (EBI). I will elaborate on that work here in order
to describe how AceDB evolved to cope with the even larger amounts of
data that emerged after the completion of the HGP. Ensembl's primary
goal was to solve the problems of representing such large data sets in
Search WWH ::




Custom Search