Biology Reference
In-Depth Information
Table 1.
The major genome annotation pipelines.
Organization
Programs
Website
Ensembl
Genscan
www.ensembl.org
Exonerate
GeneWise
Genomewise
NCBI
Gnomon
www.ncbi.nlm.nih.gov
UCSC
BLAT
www.genome.ucsc.edu
BlastZ
MultiZ
Softberry Inc.
FGENESH
www.softberry.com
FGENESH++
FGENESH_C
FlyBase
Genscan
www.flybase.org
Genie
BLASTN+Sim4
characterization of the gene content and the corresponding proteome is
of particular importance. The requirements of large-scale sequence anno-
tations have inspired the development of a handful of major computa-
tional pipelines to annotate genomic features (Table 1). This is a complex
task, especially when bearing in mind that the mere definition of a gene
is still being revised. 4 Although human expert interpretation generally
surpasses most automated approaches to accurately predict gene/feature
structures, manual curation is time-consuming and cannot be scaled up
to keep pace with the ever-increasing rate of sequencing. Thus, while
accurate computational feature identification from DNA sequences
remains a challenging problem, substantial progress has been achieved,
particularly through the exploitation of comparative genomics.
There are two main approaches in sequence analysis: (a) ab initio
gene prediction methods that rely on statistical analysis of sequence com-
position to recognize features such as exons and introns; and (b) knowledge-
based approaches that rely on available homology to known genes in
other organisms as well as other primary data such as organism-specific
expresses sequences tags (ESTs) and cDNAs. Generally, the knowledge-
based approaches are more accurate when there is a sufficient amount of
 
Search WWH ::




Custom Search