Information Technology Reference
In-Depth Information
between GenBank staff and the submitter is stored with the entry. For updates to entries
already present in the public database, the live version of the entry is retrieved from ID, and
after making changes, the annotator loads the entry back into the public database. This
entry is available to the public immediately after loading.
18. Microbial Genomes
The GenBank direct submissions group has processed more than 200 complete microbial
genomes since 1996. These genomes are relatively small in size compared with their
eukaryotic counterparts, ranging from five hundred thousand to five million bases.
Nonetheless, these genomes can contain thousands of genes, coding regions, and structural
RNAs; therefore, processing and presenting them correctly is a challenge.
Submitters of complete genomes are encouraged to contact us at
genomes@ncbi.nlm.nih.gov before preparing their entries. A FTP account is required to
submit large files, and the submission should be deposited at least 1 month before
publication to allow for processing time and coordinated release before publication. In
addition, submitters are required to follow certain guidelines, such as providing unique
identifiers for proteins and systematic names for all genes. Entries should be prepared with
the submission tool tbl2asn 8 , a utility that is part of the Sequin package .This utility creates
an ASN.1 submission file from a five-column, tab-delimited file containing feature
annotation, a FASTA-formatted nucleotide sequence, and an optional FASTA-formatted
protein sequence. For more information about using tbl2asn to submt microbial see
http://www.ncbi.nlm.nih.gov/Genbank/genomesubmit.html
Complete genome submissions are reviewed by a member of the GenBank
annotation staff to ensure that the annotation and gene and protein identifiers are correct,
and that the entry is in proper GenBank format. Any problems with the entry are resolved
through communication with the submitter. The microbial genome records in GenBank are
the building blocks for the Microbial Genome Resources in Entrez Genomes.
19. Third Party Annotation (TPA) Sequence Database
The vast amount of publicly available data from the human genome project and other
genome sequencing efforts is a valuable resource for scientists throughout the world. A
laboratory studying a particular gene or gene family may have sequenced numerous cDNAs
but has neither the resources nor inclination to sequence large genomic regions containing
the genes, especially when the sequence is available in public databases. The researcher
might choose then to download genomic sequences from GenBank and perform
experimental analyses on these sequences. However, because this researcher did not
perform the sequencing, the sequence, with its new annotations, cannot be submitted to
DDBJ/EMBL/GenBank. This is unfortunate because important scientific information is
being excluded from the public databases. To address this problem, the International
Nucleotide Sequence Database Collaboration established a separate section of the database
for such TPA (see Third Party Annotation Sequence Database [www.ncbi.nlm.nih.gov/
Genbank/tpa.html]).
All sequences in the TPA database are derived from the publicly available
collection of sequences in DDBJ/EMBL/GenBank. Researchers can submit both new and
alternative annotations of genomic sequence to GenBank. TPA entries can be also created
8
[http://intranet.ncbi.nlm.nih.gov:6224/ieb/DIRSUB/tbl2asn2.html]
Search WWH ::




Custom Search