Biology Reference
In-Depth Information
pairs of sequence B. With suffi ciently sophisticated queries, it would be
possible to join not only sequences, but also any features described in the
tables (for example, to join all the exons from given sequences to repro-
duce a protein-coding region). Sequence data could be linked together
dynamically by the user in a fl exible manner. But within this fl exibility,
this relational structure emphasizes the rearrangement of sequence ele-
ments. If the fl at-fi le structure was gene-centric, the relational database
was alignment-centric. It was designed to make visible the multiple pos-
sible orderings, combinations, and contexts of sequence elements.
By 1989, over 80% of GenBank's data had been imported into the
relational database. 67 The HGP and the relational sequence database
could not have existed without each other—they came into being to-
gether. GenBank and the HGP became mutually constitutive projects,
making each other thinkable and doable enterprises. Moreover, just as
fl at fi les had, both genome projects and relational database systems em-
bodied a particular notion of biological action: namely, one centered on
the genome as a densely networked and highly interconnected object.
In 1991, when Walter Gilbert wrote of a “paradigm shift” in biology,
he argued that soon, “all the 'genes' will be known (in the sense of be-
ing resident in databases available electronically), and that the starting
point of a biological investigation will be theoretical.” 68 This “theory”
was built into the structure of the database: phenotype or function does
not depend on a single sequence, but rather depends in complicated
ways on arrangements of sets of different sequences. The relational
database was designed to represent such arrangements.
During the 1990s, biologists investigated the “added value that is
provided by completely sequenced genomes in function prediction.” 69
As the complete genomes of bacterial organisms, including Haemophi-
lus infl uenzae , Mycoplasma genitalium , Methanococcus jannaschii , and
Mycoplasma pneumoniae , became available in GenBank, biologists at-
tempted to learn about biological function through comparative analy-
sis. The existence of orthologs, the relative placement of genes in the
genome, and the absence of genes provided important insights into the
relationship between genotype and phenotype. 70 The important differ-
ences among the bacteria and how they worked were not dependent on
individual genes, but on their arrangements and combinations within
their whole genomes. But this was exactly what the relational structure
of GenBank was designed to expose—not the details of any particular
sequence, but the ways in which sequences could be arranged and com-
bined into different “alignments.”
GenBank as a relational database provided a structure for thinking
Search WWH ::




Custom Search