Information Technology Reference
In-Depth Information
generates reports. For successful submissions, two files are generated: one contains the
submission in GenBank flat file format (without the sequence); and another is a status
report file. The status report file, ac4htgs, contains the genome center, sequence name,
Accession number, phase, create date, and update date for the submission. Submissions that
fail processing receive an error file with a short description of the error(s) that prevented
processing. The GenBank annotator also sends email to the submitter, explaining the errors
in further detail.
8. Additional Quality Assurance
When successful submissions are loaded into GenBank, they undergo additional validation
checks. If GenBank annotators find errors, they write to the submitters, asking them to fix
these errors and submit an update.
9. Whole Genome Shotgun Sequences (WGS)
Genome centers are taking multiple approaches to sequencing complete genomes from a
number of organisms. In addition to the traditional clone-based sequencing whose data are
being submitted to HTGS, these centers are also using a WGS 4 approach to sequence the
genome. The shotgun sequencing reads are assembled into contigs, which are now being
accepted for inclusion in GenBank. WGS contig assemblies may be updated as the
sequencing project progresses and new assemblies are computed. WGS sequence records
may also contain annotation, similar to other GenBank records. Each sequencing project is
assigned a stable project ID, which is made up of four letters. The Accession number for a
WGS sequence contains the project ID, a two-digit version number, and six digits for the
contig ID. For instance, a project would be assigned an Accession number
AAAX00000000. The first assembly version would be AAAX01000000. The last six digits
of this ID identify individual contigs. A master record for each assembly is created. This
master record contains information that is common among all records of the sequencing
project, such as the biological source, submitter, and publication information. There is also
a link to the range of Accession numbers for the individual contigs in this assembly. WGS
submissions can be created using tbl12asn 5 , a utility that is packaged with the Sequin
submission software. Information on submitting these sequences can be found at Whole
Genome Shotgun Submissions [http://www.ncbi.nlm.nih.gov/Genbank/wgs.html].
10. Bulk Submissions: EST, STS, and GSS
Expressed Sequence Tags (EST), Sequence Tagged Sites (STSs), and Genome Survey
Sequences (GSSs) sequences are generally submitted in a batch and are usually part of a
large sequencing project devoted to a particular genome. These entries have a streamlined
submission process and undergo minimal processing before being loaded to GenBank. EST
[http://www.ncbi.nlm.nih.gov/dbEST/]s are generally short (<1 kb), single-pass cDNA
sequences from a particular tissue and/or developmental stage. However, they can also be
longer sequences that are obtained by differential display or Rapid Amplification of cDNA
Ends (RACE) experiments. The common feature of all ESTs is that little is known about
4 [http://www.ncbi.nlm.nih.gov/ GenBank/ wgs.html]
5 [http://intranet.ncbi.nlm.nih.gov:6224/ieb/DIRSUB/tbl2asn2.html]
Search WWH ::




Custom Search