Biology Reference
In-Depth Information
database to “store entirely new types of data that could not be easily in-
tegrated into the original structure.” 64 As plans for the HGP (and other
smaller genome projects) were developed, the concept of what a se-
quence database was, and what it could be used for, had to be rethought.
The fl at-fi le database, much like the early fi le management systems,
created a rigid ordering of entries with no explicit cross-linking pos-
sible. A relational model would impose different kinds of orderings on
the data. The 1988 technical overview of GenBank justifi ed the change
to a relational model on the following bases:
One, because the domain of knowledge we are dealing with is
extremely dynamic at this point in history, we had to expect our
understanding of the data to change radically during the life-
time of the database. The relational model is well suited to such
applications. Two, even if our view of the inherent structure of
the data did not change, the ways in which the data could be
used almost certainly would change. This makes the ease of per-
forming ad hoc queries extremely important. 65
By the end of 1986, GenBank staff at Los Alamos had worked out a
structure to implement GenBank in relational form. Their plan was set
out in a document titled “A Relational Architecture for a Nucleotide
Sequence Database,” written by Michael Cinkosky and James Fickett. 66
The schema included thirty-three tables that described the sequence it-
self, its physical context (for instance, its taxonomy or the type of mol-
ecule it represented), its logical context (features such as exons, genes,
promoters), its citations, and pertinent operational data (tables of syn-
onyms). Tables could be modifi ed or added to (or extra tables could
even be added) without disrupting the overall structure or having to
amend each entry individually.
The descriptions of the “sequences” and “alignments” tables are re-
produced here. Each sequence is given an accession number that acts as
the primary key for the table. The “publication_#” and “reference_#”
keys link to a table of publications, and “entered_by” and “revised_
by” keys link to tables of people (curators or authors). As is noted in
the description, such sequences may not correspond to actual physical
fragments—that is, they may not represent a particular gene or a par-
ticular sequence produced in a sequencing reaction. Rather, the relation-
ship between sequences and physical fragments is “many-to-many”: a
fragment may be made up of many sequences, and any given sequence
may be a part of multiple fragments. In other words, there is no straight-
Search WWH ::




Custom Search