Biology Reference
In-Depth Information
Box 7.1 Not All Genomes Are Equal: Categories of Genome-Analysis Quality
Type
Description
Standard Draft
Contains minimally or unfiltered data, which are assembled into contigs.
This is the minimum standard for a submission to the public databases.
Sequence of this quality will likely harbor many regions of poor quality
and can be relatively incomplete, but is the least expensive to produce and
contains useful information.
High-quality Draft
Includes at least 90% of the genome or target regions, excluding contaminating
sequences. It is still a draft assembly with little or no manual review of the
product.
Improved High-quality
Draft
Additional work by either manual or automated methods, should contain no
discernible misassemblies and should under go some type of gap resolution
to reduce the number of contigs and scaffolds. Low-quality regions and
potential base errors may be present, but the quality is adequate for
comparison with other genomes.
Annotation-directed
Improvement Quality
Draft
This may overlap with previous standards, but emphasizes the verification and
correction of anomalies within coding regions. Gene models and annotation
should fit the biology. Problems should be noted in the submission. Repeat
regions are not resolved.
Noncontiguous
Finished
High-quality assembly subjected to automated and manual improvement,
and gaps, misassemblies, and low-quality regions have been resolved. Some
repetitive or other areas, including heterochromatin may not be resolved. For
eukaryotes, this used to be called “finished.”
Finished
Refers to the gold standard of less than 1 error per 100,000 bp and each
replicon is assembled into a single contiguous sequence. All sequences are
complete, reviewed and edited. Repetitive sequences have been assembled.
Achieved primarily with small microbial genomes.
Modified from Chain et al. (2009).
to other genes. This provides a preliminary set of annotations, but these are
provisional and it is likely that up to one-third of your sequences will have no
homology to any in the databases. These so-called orphan genes may be of par-
ticular interest because these may be providing your species with the traits that
are unique to its biology. A community-based annotation phase comes next, in
which cooperating scientists assist in comparing gene families. Functional gene
analyses also need to be done to confirm the function of candidate genes that
may support conclusions about the function of specific genes. In addition to
annotating protein-coding genes, the identification of transposons, regulatory
regions, pseudogenes, and noncoding RNAs can be conducted, although these
are more difficult. The resulting information can be published, but it provides
only a starting point for understanding the genome of your species. Genome
sequences can be used as the basis of experiments for years, using a variety
Search WWH ::




Custom Search