Information Technology Reference
In-Depth Information
from these coding regions. The BankIt validator compares the amino acid sequence
provided by the submitter with the conceptual translation of the coding region based on the
provided spans. If there is a discrepancy, the submitter is requested to fix the problem, and
the process is halted until the error is resolved. To prevent the deposit of sequences that
contain cloning vector sequence, a BLAST similarity search is performed on the sequence,
comparing it to the Vec-Screen [http://www.ncbi.nlm.nih.gov/VecScreen/VecScreen.html]
database. If there is a match to this database, the user is asked to remove the contaminating
vector sequence from their submis-sion or provide an explanation as to why the screen was
positive. Completed forms are saved in ASN.1 format, and the entry is submitted to the
GenBank processing queue. The submitter receives confirmation by email, indicating that
the submission process was successful. Sequin Sequin 6 is more appropriate for complicated
submissions containing a significant amount of annotation or many sequences. It is a stand-
alone application available on NCBI's FTP [ftp://ftp.ncbi.nih.gov/sequin/] site. Sequin
creates submissions from nucleotide and amino acid sequences in FASTA format with
tagged biological source information in the FASTA definition line. As in BankIt, Sequin
has the ability to predict the spans of coding regions. Alternatively, a submitter can specify
the spans of their coding regions in a five-column, tab-delimited table
[http://www.ncbi.nlm.nih. gov/Sequin/table.html] and import that table into Sequin. For
submitting multiple, related sequences, e.g., those in a phylogenetic or population study,
Sequin accepts the output of many popular multiple sequence-alignment packages,
including FASTA+GAP, PHYLIP, MACAW, NEXUS Interleaved, and NEXUS
Contiguous. It also allows users to annotate features in a single record or a set of records
globally. For more information on Sequin, see Chapter 12.
Completed Sequin submissions should be emailed to GenBank at gb-sub@ncbi.
nlm.nih.gov. Larger files may be submitted by SequinMacrosend [www.ncbi.nlm.nih.
gov/LargeDirSubs/ dir_submit.cgi].
17. Sequence Data Flow and Processing: From Laboratory to GenBank
17.1 Triage
All direct submissions to GenBank, created either by Sequin or BankIt, are processed by
the GenBank annotation staff. The first step in processing submissions is called triage.
Within two working days of receipt, the database staff reviews the submission to determine
whether it meets the minimal criteria for incorporation into GenBank and then assigns an
Accession number to each sequence. All sequences must be >50 bp in length and be
sequenced by, or on behalf of, the group submitting the sequence. GenBank will not accept
sequences constructed in silic o; noncontiguous sequences containing internal, unsequenced
spacers; or sequences for which there is not a physical counterpart, such as those derived
from a mix of genomic DNA and mRNA. Submissions are also checked to determine
whether they are new sequences or updates to sequences submitted previously. After
receiving Accession numbers, the sequences are put into a queue for more extensive
processing and review by the annotation staff.
6 [http://www.ncbi.nlm.nih.gov/Sequin/index.html]
Search WWH ::




Custom Search