Information Technology Reference
In-Depth Information
17.2 Indexing
Triaged submissions are subjected to a thorough examination, referred to as the indexing
phase. Here, entries are checked for:
1. Biological validity. For example, does the conceptual translation of a coding region
match the amino acid sequence provided by the submitter? Annotators also ensure that the
source organism name and lineage are present, and that they are represented in NCBI's
taxonomy database. If either of these is not true, the submitter is asked to correct the
problem. Entries are also subjected to a series of BLAST similarity searches to compare the
annotation with existing sequences in GenBank.
2. Vector contamination. Entries are screened against NCBI's UniVec 7 database to detect
contaminating cloning vector.
3. Publication status. If there is a published citation, PubMed and MEDLINE identifiers are
added to the entry so that the sequence and publication records can be linked in Entrez.
4. Formatting and spelling. If there are problems with the sequence or annotation, the
annotator works with the submitter to correct them.
Completed entries are sent to the submitter for a final review before release into the
public database. If the submitters requested that their sequences be released after
processing, they have 5 days to make changes prior to release. The submitter may also
request that GenBank hold their sequence until a future date. The sequence must become
publicly available once the Accession number or the sequence has been published. The
GenBank annotation staff currently processes about 2200 submissions per month,
corresponding to approximately 26,000 sequences. GenBank annotation staff must also
respond to email inquiries that arrive at the rate of approximately 300 per day. These
exchanges address a range of topics including:
x updates to existing GenBank records, such as new annotation or sequence changes
x problem resolution during the indexing phase
x requests for release of the submitter's sequence data or an extension of the hold date
x requests for release of sequences that have been published but are not yet available in
GenBank
x lists of Accession numbers that are due to appear in upcoming issues of a publisher's
journals
x reports of potential annotation problems with entries in the public database
x requests for information on how to submit data to GenBank
One annotator is responsible for handling all email received in a 24-hour period, and
all messages must be acted upon and replied to in a timely fashion. Replies to previous
emails are forwarded to the appropriate annotator.
17.3 Processing Tools
The annotation staff uses a variety of tools to process and update sequence submissions.
Sequence records are edited with Sequin, which allows staff to annotate large sets of
records by global editing rather than changing each record individually. This is truly a time
saver because more than 100 entries can be edited in a single step. Records are stored in a
database that is accessed through a queue management tool that automates some of the
processing steps, such as looking up taxonomy and PubMed data, starting BLAST jobs, and
running automatic validation checks. Hence, when an annotator is ready to start working on
an entry, all of this information is ready to view. In addition, all of the correspondence
7 [http://www.ncbi.nlm. nih.gov/VecScreen/UniVec.html]
Search WWH ::




Custom Search