Biology Reference
In-Depth Information
Currently, state-of-the-art sequence database search tools employ more sophisticated
algorithms to reduce computation time and increase sensitivity to weak similarities.
For example, tools such as SAM 22 and HMMER 23 employ hidden Markov models to identify
sequence homology. Perhaps the most widely used search tool is the Basic Local Alignment
Search Tool, or BLAST, first presented by Altschul and coworkers in 1990. 24 At its inception,
BLAST offered high-sensitivity database searching at speeds much faster than any previous
algorithm, and proved amenable to mathematical and statistical analysis. Subsequent
versions, such as gapped BLAST and position-specific iterative (PSI) BLAST, have further
improved computation time and sensitivity to weak, but still biologically relevant,
similarity. 25 Functional prediction via BLAST is further enhanced through coupling with the
Conserved Domain Database (CDD), 26 which integrates data from sources such as Pfam 27
and SMART 28 to identify regions of the query sequence with evolutionarily conserved
functions, such as binding a metal ion or cofactor. BLAST results can also be coupled to
tools such as GCView, 29 which enable analysis of the genomic context of search results
to facilitate more accurate functional prediction.
In recent years, predictive algorithms have expanded beyond individual coding sequence
queries to a variety of other targets. For example, IsoRankN enables the alignment of entire
protein
protein interaction networks for the prediction of functional orthologues across
species. 30 Tools such as PromPredict, 31 ConTra, 32 and RSAT, 33 among others, focus not on
protein sequences, but on the sequences of regulatory regions such as promoters and
transcription factor binding sites. Such tools have clear applications in synthetic biology,
not only for the design of biosynthetic pathways, but also for synthetic gene circuits
and signal transduction systems.
PATHWAY DISCOVERY, PREDICTION, AND ANALYSIS
The computational tools described above are generally applicable to any DNA or protein
sequence, and as a result have proven very useful for a wide range of applications. For the
synthesis of drugs and drug candidates, however, special attention is paid to those proteins
that are involved in secondary metabolism. This is due to the observation that secondary
metabolite natural products and their derivatives and analogues represent a substantial
fraction of the drugs available today. For example, in 2007 it was reported that 72.9% of
anticancer drugs and 68.9% of small molecule antiinfectives are natural products or derived
therefrom. 1 As a result, a number of tools have been developed for the discovery,
prediction, and analysis of secondary metabolite gene clusters ( Table 10.1 ).
185
Two classes of natural products that have garnered significant research interest are
polyketides and nonribosomal peptides, complex compounds that are synthesized
by multimodular, assembly line megasynthases known as polyketide synthases (PKSs)
and nonribosomal peptide synthetases (NRPSs), respectively. In the past 15 years, a number
of computational tools have been developed not only for the identification of PKS and
NRPS gene clusters from DNA sequence data, but also for the prediction of their
corresponding products. Some of the earliest efforts toward in silico prediction of NRPS
products focused on the specificity of adenylation domains. In 1997, de Crécy-Lagard and
coworkers examined 55 adenylation domain sequences to devise rules for specificity
prediction, but found that they could only come up with good predictions in 43% of
cases. 34 Two years later, however, analysis of the crystal structure of the adenylation domain
PheA involved in gramicidin S biosynthesis enabled two groups to provide much more
accurate specificity predictions. Stachelhaus and coworkers identified 10 specificity-
conferring residues, allowing 86% accuracy in specificity prediction. 35 Challis and coworkers
took a very similar approach, identifying an eight-residue signature sequence. 36 More
recently, a sophisticated prediction algorithm based on transductive support vector
machines has been devised that also incorporates the physicochemical properties of the
Search WWH ::




Custom Search