Information Technology Reference
In-Depth Information
them; therefore, they lack feature annotation. STSs [http://www.ncbi.nlm.nih.gov/dbSTS/]
are short genomic landmark sequences (1). They are operationally unique in that they are
specifically amplified from the genome by PCR amplification. In addition, they define a
specific location on the genome and are, therefore, useful for mapping.
GSS [http://www.ncbi.nlm.nih.gov/dbGSS/]s are also short sequences but are derived from
genomic DNA, about which little is known. They include, but are not limited to, single-pass
GSSs, BAC ends, exon-trapped genomic sequences, and AluPCR sequences. EST, STS,
and GSS sequences reside in their respective divisions within GenBank, rather than in the
taxonomic division of the organism. The sequences are maintained within GenBank in the
dbEST, dbSTS, and dbGSS databases.
11. Submitting Data to dbEST, dbSTS, or dbGSS
Because of the large numbers of sequences that are submitted at once, dbEST, dbSTS, and
dbGSS entries are stored in relational databases where information that is common to all
sequences can be shared. Submissions consist of several files containing the common
informa-tion, plus a file of the sequences themselves. The three types of submissions have
different requirements, but all include a Publication file and a Contact file. See the dbEST
[http://www.ncbi. nlm.nih.gov/dbEST/], dbSTS [http://www.ncbi.nlm.nih.gov/dbSTS/], and
dbGSS [http://www.ncbi. nlm.nih.gov/dbGSS/] pages for the specific requirements for each
type of submission. In general, users generate the appropriate files for the submission type
and then email the files to batch-sub@ncbi.nlm.nih.gov. If the files are too big for email,
they can be deposited into a FTP account. Upon receipt, the files are examined by a
GenBank annotator, who fixes any errors when possible or contacts the submitter to request
corrected files. Once the files are satisfactory, they are loaded into the appropriate database
and assigned Accession numbers. Additional formatting errors may be detected at this step
by the data-loading software, such as double quotes anywhere in the file or invalid
characters in the sequences. Again, if the annotator cannot fix the errors, a request for a
corrected submission is sent to the user. After all problems are resolved, the entries are
loaded into GenBank.
12. Bulk Submissions: HTC and FLIC
HTC records are High-Throughput cDNA/mRNA submissions that are similar to ESTs but
often contain more information. For example, HTC entries often have a systematic gene
name (not necessarily an official gene name) that is related to the lab or center that
submitted them, and the longest open reading frame is often annotated as a coding region.
FLIC records, Full-Length Insert cDNA, contain the entire sequence of a cloned
cDNA/mRNA. Therefore, FLICs are generally longer, and sometimes even full-length,
mRNAs. They are usually annotated with genes and coding regions, although these may be
lab systematic names rather than functional names.
13. HTC Submissions
HTC entries are usually generated with Sequin [http://www.ncbi.nlm.nih.gov/
Sequin/index.html] or tbl2asn [http://www.ncbi.nlm.nih.gov/Sequin/table.html], and the
files are emailed to gb-sub@ncbi.nlm.nih.gov. Larger files may be submitted by
SequinMacrosend [www.ncbi.nlm.nih. gov/LargeDirSubs/ dir_submit.cgi].. HTC entries
Search WWH ::




Custom Search