Biology Reference
In-Depth Information
as shortcuts for copying large numbers of databases at once. In order
to achieve the last step, the release coordinator must be “on” (that is,
logged into) the staging server. The document, over forty pages long in
all, provides step-by-step instructions on how to move oneself and the
data around in virtual space order to perform the release cycle.
This description of the work at EBI shows the complexity of collect-
ing, storing, and organizing biological data. Without such work, genomic
biology would not be possible. Ensembl is an essential tool for manag-
ing big data, for making long strings of As, Gs, Ts, and Cs into “genes,”
“regulatory elements,” and other biological objects. The Ensembl web-
based interface attracts hundreds of thousands of users per month, and
it has been cited in hundreds of research articles.
38
User-biologists can
download short sequences or whole genomes, access specifi c data based
on Ensembl's annotations (for instance, just fi nd particular genes), com-
pare data between genomes, analyze intraspecies variation, or examine
regulatory sequences. What biologists can do with Ensembl is exactly
the kind of thing biologists need to do all the time: it is the very work of
biology itself. For example, the following excerpt comes from a website
giving advice about how to use Ensembl:
Let's say you have a set of genes in one species and you want
to know the orthologs in another species and gene expression
probes in that species you can use to assay those orthologs. For
example, [given] 25 gene expression probes that are dysregu-
lated in humans when exposed to benzene. What if you only
had the U133A/B Affymetrix probe IDs and wanted to know
the gene names? What if you also wanted all the Ensembl gene
IDs, names, and descriptions of the mouse orthologs for these
human genes? Further, what are the mouse Affymetrix 430Av2
probe IDs that you can use to assay these genes' expression in
mouse? All this can be accomplished for a list of genes in about
60 seconds using [Ensembl].
39
Ensembl's tools allow biologists to deal rapidly with large amounts of
data in different formats, comparing them across different organisms.
Ensembl and other tools like it are the most valuable resources available
for studying genes and genomes, allowing biologists to manipulate and
analyze the vast amount of available genomic data.
Bioinformatics is the task of ordering biological data in order to
make them usable for biological research. Bioinformaticians must strive
to maintain close control over their spaces, restricting access and pro-