Biology Reference
In-Depth Information
munity, too, were concerned about the choice of a nonacademic institu-
tion to manage the distribution efforts. 46
Despite the fact that Goad's pilot project had used a sophisticated
database structure, the NIH insisted that the new data bank—which
would become GenBank—be built as a fl at fi le.” A fl at fi le is a text-
based computer fi le that simply lists information about nucleotide se-
quences line by line. Each line begins with a two-letter code specifying
the information to be found on that line—“ID” gives identifying in-
formation about the sequence, “DT” gives the date of its publication,
“KW” provides keywords, “FT” lists features in the sequence, and the
sequence itself corresponds to lines beginning with “SQ.” Different se-
quences could be listed one after another in a long text fi le separated by
the delimiter “//” (fi gure 5.1).
The NIH held the view that the GenBank format should be read-
able both by computers and by humans. By using the two-letter line
identifi ers, a simple program could extract information from the fl at-fi le
entries. A major disadvantage of this format, however, was the diffi -
culty involved in updating it. If, for instance, it was decided that it was
important to add a further line including information about the type
of sequencing experiment used to generate the sequence, the database
curators would have to modify each sequence entry one by one. More-
over, a fl at fi le does not lend itself to the representation of relationships
between different entries—the list format makes it impossible to group
entries in more than one way or to link information across more than
one entry.
The fl at-fi le format was suited to the NIH's notion that a nucleotide
database should be no more than a simple collection, a laundry list of
sequences. However, it also embodied a particular way of understanding
biology and the function of genes. George Beadle and Edward Tatum's
“one gene-one enzyme” hypothesis is considered one of the founding
dogmas of molecular biology. Although the idea (and its successor, “one
gene-one polypeptide”) had been shown to be an oversimplifi cation
even by the 1950s, the notion that it is possible to understand life by
considering the actions of individual genes exerted a profound infl uence
on at least forty years of biological research. 47
In the late 1970s, as a result of the sequencing methods invented by
Allan Maxam, Walter Gilbert, and Frederick Sanger, the possibility of
discovering the mechanism of action of particular genes seemed within
reach. Some molecular geneticists began to focus their efforts on fi nding
and sequencing the genes responsible for particular diseases, such as
Search WWH ::




Custom Search