Drug Discovery and Development via Synthetic Biology - Synthetic Biology

Biology Reference

In-Depth Information

bioinformatics tools, while the latter has been enabled by a number of sophisticated

and versatile experimental tools.

Computational Tools

GENE IDENTIFICATION

In nature, there exists an incredibly diverse array of species adapted to survival in all kinds

of environments. In the genomes of these organisms are the templates for countless proteins

which catalyze a myriad of chemical transformations, many of which could be useful

for synthetic biology applications. As a result, computational tools are essential to rapidly

and accurately identify the true protein-coding sequences from DNA sequence data. The

earliest gene identification algorithms were developed mainly for the analysis of shorter

DNA sequences in which the exact coding sequence of a protein was ambiguous. These

methods were reasonably simplistic, but provided fairly accurate predictions of

coding

versus

sequences. For example, the TESTCODE algorithm devised in 1982

misclassified only 5% of test sequences, but drew no conclusion for 20% of test sequences. 2

noncoding

Subsequent prediction algorithms employed more sophisticated approaches to achieve

better results. For example, the GeneMark program of Borodovsky and McIninch

(initially referred to as GENMARK) combined nonhomogeneous Markov chain models with

Bayesian decision-making for coding sequence prediction. 3 This program also introduced

simultaneous analysis of both DNA strands as a method of improving accuracy. As the

sequencing of entire genomes became realized, the need for reliable gene prediction was

underscored. To improve the GeneMark program for entire bacterial genomes, a hidden

Markov model framework was implemented, as well as recognition of ribosome binding

site sequences. 4 Further improvements came with the application of self training for

new prokaryotic genome sequences, 5 and expansion to eukaryotic and viral systems. 6,7

GLIMMER represents a complementary tool for gene identification that was built on

interpolated Markov models. 8,9 This tool has similarly been adapted to eukaryotic DNA, 10,11

as well as endosymbiont and metagenome DNA. 12,13 These tools and others continue

to be indispensable in the identification of new and potentially interesting protein-coding

sequences from the ever-expanding volume of DNA sequence information available.

184

PREDICTION OF GENE FUNCTION

Synthetic biologists are typically interested in proteins for the transformations that they

catalyze, but sequence information alone is not enough to describe a protein

s utility.

Bench-top experiments both in vitro and in vivo are, of course, the best way to determine

protein function. However, the vast success in DNA sequencing and coding sequence

identification has provided such a wealth of putative protein targets that laboratory

characterization of them all is simply not feasible. Fortunately, if two proteins have similar

primary sequences, it is quite likely that they will also share similar functions. As a result,

sequence alignment and homology analysis based on proteins of known function have

proved vital to the accurate prediction of protein function from sequence data alone.

The earliest exercises in protein homology comparisons were carried out to evaluate

evolutionary relationships rather than to predict function. 14 16 Nevertheless, these

algorithms provided the groundwork upon which subsequent protein alignment tools

would be built. In 1985, Lipman and Pearson noted the increasing number of protein

sequences made available at the time, and that functions could be inferred by comparison

to other characterized proteins. As a result, they developed the FASTP algorithm for rapid

in silico comparison of a query sequence to a protein sequence database. 17 This was

followed by FASTA, which featured improved sensitivity, and LFASTA, which allowed for

analyses of local similarity. 18,19 Other tools followed, such as MSA and CLUSTAL W,

for the high-sensitivity alignment of smaller sets of proteins. 20,21

Synthetic Biology

Search WWH ::

Custom Search

Home