Biology Reference
In-Depth Information
cystic fi brosis or Duchenne muscular dystrophy. 48 For such an activity,
a fl at-fi le database was an appropriate resource: each sequence entry in
the database could be roughly equated with one gene, and that gene had
a specifi c, defi nite, and singular effect on the biology of the organism in
which it dwelt. The NIH imagined that researchers would use the data-
base primarily as a central repository or archive into which sequences
would be deposited once and for all; for the most part, sequences would
only need to be retrieved one at a time in order to make comparisons
with experimental work. 49 Similar sequences usually resulted in similar
protein structures with similar functions; hence, matching an unknown
sequence to a known sequence in the database could provide invaluable
information. For such activities, the gene of interest could be simply
compared with the long list of entries in the database one by one. The
fl at-fi le database structure was ideal for this kind of search operation;
it entailed and represented a theory of how sequence elements acted to
produce biological effects.
Once GenBank began operations in July 1982, it became clear to
those doing the work of collection and distribution at Los Alamos and
BBN that the database was attracting a far wider scope of use. As well
as revealing the sequences of genes, the new sequencing technologies
had an unexpected consequence: they allowed biologists to sequence
not only individual genes, but also regulatory regions, structural RNA-
coding regions, regions of unknown function, and even whole genomes
(at fi rst limited to small genomes such as those of viruses or cloning
vectors). This meant that a sequence in the database did not necessarily
correspond neatly to a single gene. Molecular geneticists began to real-
ize that not all the information necessary to understand gene action was
contained within the gene sequence—how the gene was spliced, where
it was expressed, and how it was phosphorylated were also crucially im-
portant. 50 In the fl at-fi le format, such information was contained within
the “Features” table for each entry. The Features table consisted of a
single line for each feature, as in this example:
FT
fi rstexon
EXON
273-286
FT
tatabox
TATA
577-595
The three columns identifi ed the name of the feature (such as “fi rstexon”),
the feature type (here an exon or a TATA box), and the coordinates in
the sequence at which that feature was to be found. 51 Entering this in-
Search WWH ::




Custom Search