Biology Reference
In-Depth Information
ers explained the importance of the virtual contig concept for both the
computer and the user:
[The virtual contig object] allows access to genomic sequence
and its annotation as if it was a continuous piece of DNA in
a 1-N coordinate space, regardless of how it is stored in the
database. This is important since it is impractical to store large
genome sequences as continuous pieces of DNA, not least be-
cause this would mean updating the entire genome entry when-
ever any single base changed. The VC object handles reading
and writing of features and behaves identically regardless of
whether the underlying sequence is stored as a single real piece
of DNA (a single raw contig) or an assembly of many fragments
of DNA (many raw contigs). Because features are always stored
at the raw contig level, “virtual contigs” are really virtual and
as a result less fragile to sequence assembly changes. It is this
feature that allows Ensembl to handle draft genome data in a
seamless way and makes it possible to change between different
genome assemblies relatively painlessly. 32
The virtual contig object mediates between the computer and the lab,
both in allowing the database to be adaptable to the vagaries and un-
certainties of real biological data and in presenting the fi nal sequence
in a way that makes sense to the biologist. But virtual contigs are also
abstract representations that allow bioinformaticians to cut away the
messiness of the internal database representations; in fact, they are
highly structured objects whose properties are determined by the Perl
classes to which they belong. Once again, the representation (the virtual
contig) alters what it is possible to do with—and how the biologist
imagines—the biological object itself (the contig as an experimental ob-
ject of sequencing). Biologists' intuition about the object is based on the
properties of its “adaptors”—that is, on how it can be moved around
and manipulated inside the database. In mediating between biological
and computational representations, these sorts of structures come to
redefi ne biological objects themselves.
The dependence of biology on computational representations be-
comes even clearer if we examine the if nal and most frequently manipu-
lated layer of Ensembl: the web interface. It is at this point that biolo-
gists interact with genomes every day in their routine work. The images
on the screen are not generated directly from the database. Rather, when
a remote user navigates to a particular region of the genome, the web
Search WWH ::




Custom Search