Biology Reference
In-Depth Information
number of computers to which a user can connect via ssh, and bioin-
formaticians are routinely logged into many at once, using a window-
based operating system to fl ick back and forth between the various
connections. For instance, a bioinformatician's task might involve si-
multaneously writing a program on his or her own machine, looking up
a database on a public server, and copying data to or from disk space
on a third machine. A large part of the virtuosity of such work is in be-
ing able to move oneself and one's data rapidly between places. Indeed,
bioinformaticians, like software engineers, constantly seek to reduce the
number of keystrokes necessary to move around. They can do this by
setting up aliases, short commands that act as abbreviations of lon-
ger ones. Or they can use their knowledge of programming languages
such as Perl and regular expressions to fi nd a shortcut for all but the
most intricate of maneuvers. In programs and on the command line, it
is common to see bioinformaticians using abstruse strings (for instance:
{^(?:[^f]|f(?!oo))*$} ”) in order to save themselves extra typing. Hav-
ing a working grasp of such intricacies, combined with a knowledge of
where important fi les and programs are located on the network, makes
a skillful bioinformatician.
Much of the work of bioinformatics can be understood as the move-
ment and transformation of data in virtual space. At EBI, I closely fol-
lowed the progress of the “release cycle,” a process that occurs every
couple of months through which the EBI's main database (known as
Ensembl) is revised and updated. A detailed description of the release
cycle will illustrate the importance of space management in bioinfor-
matic work.
Much of the work of the release coordinator is making sure that
the right data end up in the right place in the right form. Ensembl does
not produce its own data; instead, its role is to collect data from a wide
variety of sources and make them widely available in a common, coher-
ent, and consistent format. Ensembl is also an automatic annotation
system: it is software that takes raw genomic sequence and identifi es the
locations of particular structures (e.g., genes) and their functions. Mak-
ing a release involves collecting data from many individuals and places
and running them through a software pipeline. For such a large set of
databases, it is not possible to simply update them one by one. Ensembl
requires a sophisticated “staging” system whereby the new release is
prepared, processed, tested, and checked for consistency before it is re-
leased “live” onto the World Wide Web. Thus the release cycle becomes
a carefully choreographed set of movements through a virtual space in
Search WWH ::




Custom Search