Biology Reference
In-Depth Information
region identifi cation based on CRITICA (Badger and Olsen, 1999). Gapped BLAST is an extension
of the BLAST method that incorporates statistical analysis of alignments with gaps leading to high
search sensitivity. PSI-BLAST (position-specifi c iterative BLAST) enables one to identify similarities
between distant protein families (Altschul et al ., 1997; Altschul and Koonin, 1998). Gene searching
models such as Critica, Glimmer or Generation are employed during the genome annotation process
to identify sequences coding for proteins or various types of RNA (tRNA, rRNA), ribosome-binding
sites, terminators, insertion sequences, promoter regions, laterally transferred genes and non-coding
regions or genes showing relatively weak sequence similarities. It is the latter regions designated as
'twilight zone' that pose a real challenge for the automatic sequence annotation software programmes
developed (Koonin and Galperin, 1997). Some of these are GeneQuiz (a workbench for sequence
analysis; Scharf et al ., 1994) and MAGPIE system architecture (a fully automated genome analysis
programme; Gaasterland and Sensen, 1996). The former is an annotation platform that serves
both prokaryotic and eukaryotic sequence annotation, while the latter is meant exclusively for
prokaryotic genome annotation. Other prokaryotic annotation platforms are Imagene (Medigue et al .,
1999), ATUGC (Bazzan et al ., 2003), GenDB (Meyer et al ., 2003), SABIA (Almeida et al ., 2004), MaGe
(Vallenet et al ., 2006) and AGMIAL (Bryson et al ., 2006). Lombardot et al . (2006) created a site, Megx.
net-a database resource for marine ecological genomics, useful for genomic and metagenomic data
through which genome browsing, environmentally relevant protein families and group specifi c genes
can be successfully identifi ed. Additionally, it is possible to identify laterally transferred genes, or
transposase and phage insertions by the TETRA software tool that computes tetranucleotide usage
patterns. To address to the problems created by wrong annotation and provide a means of expert
review, Markowitz et al. (2009) created a website known as IMGER that helps in the systematic and
effi cient revision of microbial genome annotation.
From the known gene sequence, the amino acid sequence of the putative protein is deduced
and functional annotation is carried out, which is a challenging task because one has to take into
account the molecular, cellular and phenotypic functions of the particular protein. At molecular
level whether it is an enzyme, transporter, repressor or a structural protein has to be determined. At
cellular level the role of the protein in a particular metabolic pathway or signalling cascade has to
be assessed. Finally, the effect of the protein on general properties of the organism such as gliding
motion or other cellular appendages or sporulation has to be determined. A quite useful approach
in genome annotation is to identify gene pair-based close bidirectional best hits (BeTs) across two
genomes by taking into account conserved gene clusters between them (Overbeek et al ., 1999). Genes
that are co-transcribed generally are associated with the same function or participate in the same
metabolic pathway. Identifi cation of operons not only helps in understanding gene regulation but
also provides important information on genome annotation. One can take into account the presence
of gene clusters and their order in well studied prokaryotic systems such as E . coli by using intergenic
distance distributions and of functional relationships between them. Methods so developed have been
useful in predicting the existence of operons with a maximum accuracy of 88% in E . coli chromosome
(Salgado et al ., 2000). Prediction of operons by a computational method is based on the presence
of conserved gene pairs and the frequencies of their occurrence in various bacterial and archaeal
genomes, the critical points of evaluation being the maintenance of a certain intergenic distance
and that all genes in an operon are on the same strand. This method, however, does not take into
account either the functions of the genes or their promoters and terminators (Ermolaeva et al ., 2001).
Search WWH ::




Custom Search