Biology Reference
In-Depth Information
In an attempt to convince his committee and others of the value
of his work, Ostell also embarked on applying his software to various
biological problems, collaborating with others in the Harvard BioLabs.
This effort resulted in signifi cant success, particularly in using his pro-
grams to analyze conservation patterns and codon bias to determine
protein-coding regions and exon boundaries in Drosophila and broad
bean ( Vicia faba ) genes. 69 These sorts of problems have two signifi -
cant features. First, they require the manipulation and management of
large amounts of data. Analysis of conservation patterns, for instance,
requires organizing sequences from many organisms according to ho-
mology before performing comparisons. Second, analyzing codon bias
and fi nding protein-coding regions are statistical problems. They treat
sequences as a stochastic space, where the problem is one of fi nding a
“signal” (a protein-coding region) amid the “noise” of bases. Consider
this excerpt from Ostell's thesis in which he explains how codon bias is
calculated:
Each sequence used to make the table is then compared to every
other sequence in the table by Pearson product moment correla-
tion coeffi cient. This is, the bias is calculated for each codon in
each of two sequences being compared. The correlation coef-
fi cient is then calculated comparing the bias for every codon
between the two sequences. The correlation coeffi cient gives a
sense of the “goodness of fi t” between the two tables. A cor-
relation coeffi cient is also calculated between the sequence and
aggregate table. Finally a C statistic, with and without strand
adjustment, is calculated for the sequence on both its correct
and incorrect strands. These calculations give an idea how well
every sequence fi ts the aggregate data, as well as revealing rela-
tionships between pairs of sequences. 70
Countless similar examples could be taken from the text: the basis of
Ostell's programs was the use of the computer as a tool for managing
and performing statistical analysis on sequences.
The story has a happy ending. By 1987, Ostell's committee allowed
him to submit his thesis. As David Lipman began to assemble a team
for the new National Center for Biotechnology Information (NCBI),
he realized that he had to employ Ostell—there was no one else in the
world who had such a deep understanding of the informational needs
of biologists. Selling the rights to his software so as not to create a
confl ict of interest, Ostell began work at the NCBI in November 1988
Search WWH ::




Custom Search