Ordering Objects - Life Out of Sequence

Biology Reference

In-Depth Information

database to “store entirely new types of data that could not be easily in-

tegrated into the original structure.” 64 As plans for the HGP (and other

smaller genome projects) were developed, the concept of what a se-

quence database was, and what it could be used for, had to be rethought.

The fl at-fi le database, much like the early fi le management systems,

created a rigid ordering of entries with no explicit cross-linking pos-

sible. A relational model would impose different kinds of orderings on

the data. The 1988 technical overview of GenBank justifi ed the change

to a relational model on the following bases:

One, because the domain of knowledge we are dealing with is

extremely dynamic at this point in history, we had to expect our

understanding of the data to change radically during the life-

time of the database. The relational model is well suited to such

applications. Two, even if our view of the inherent structure of

the data did not change, the ways in which the data could be

used almost certainly would change. This makes the ease of per-

forming ad hoc queries extremely important. 65

By the end of 1986, GenBank staff at Los Alamos had worked out a

structure to implement GenBank in relational form. Their plan was set

out in a document titled “A Relational Architecture for a Nucleotide

Sequence Database,” written by Michael Cinkosky and James Fickett. 66

The schema included thirty-three tables that described the sequence it-

self, its physical context (for instance, its taxonomy or the type of mol-

ecule it represented), its logical context (features such as exons, genes,

promoters), its citations, and pertinent operational data (tables of syn-

onyms). Tables could be modifi ed or added to (or extra tables could

even be added) without disrupting the overall structure or having to

amend each entry individually.

The descriptions of the “sequences” and “alignments” tables are re-

produced here. Each sequence is given an accession number that acts as

the primary key for the table. The “publication_#” and “reference_#”

keys link to a table of publications, and “entered_by” and “revised_

by” keys link to tables of people (curators or authors). As is noted in

the description, such sequences may not correspond to actual physical

fragments—that is, they may not represent a particular gene or a par-

ticular sequence produced in a sequencing reaction. Rather, the relation-

ship between sequences and physical fragments is “many-to-many”: a

fragment may be made up of many sequences, and any given sequence

may be a part of multiple fragments. In other words, there is no straight-

Search WWH ::

Custom Search

Home