Information Technology Reference
In-Depth Information
2. International Collaboration
In the mid-1990s, the GenBank database became part of the International Nucleotide
Sequence Database Collaboration with the EMBL database ( European Bioinformatics
Institute [http://www. ebi.ac.uk/], Hinxton, United Kingdom) and the Genome Sequence
Database (GSDB; LANL, Los Alamos, NM). Subsequently, the GSDB was removed from
the Collaboration (by the National Center for Genome Resources, Santa Fe, NM), and
DDBJ [http://www.ddbj.nig.ac.jp/] (Mishima, Japan) joined the group. Each database has
its own set of submission and retrieval tools, but the three databases exchange data daily so
that all three databases should contain the same set of sequences. Members of the DDBJ,
EMBL, and GenBank staff meet annually to discuss technical issues, and an international
advisory board meets with the database staff to provide additional guidance. An entry can
only be updated by the database that initially prepared it to avoid conflicting data at the
three sites. The Collaboration created a Feature Table Definition 2 that outlines legal
features and syntax for the DDBJ, EMBL, and GenBank feature tables. The purpose of this
document is to standardize annotation across the databases. The presentation and format of
the data are different in the three databases, however, the underlying biological information
is the same.
3. Confidentiality of Data
When scientists submit data to GenBank, they have the opportunity to keep their data
confidential for a specified period of time. This helps to allay concerns that the availability
of their data in GenBank before publication may compromise their work. When the article
containing the citation of the sequence or its Accession number is published, the sequence
record is released. The database staff request that submitters notify GenBank of the date of
publication so that the sequence can be released without delay. The request to release
should be sent to gb-admin@ ncbi.nlm.nih.gov.
4. Direct Submissions
The typical GenBank submission consists of a single, contiguous stretch of DNA or RNA
sequence with annotations. The annotations are meant to provide an adequate
representation of the biological information in the record. The GenBank Feature Table
Definition [http://www.ncbi. nlm.nih.gov/collab/FT/index.html] describes the various
features and subsequent qualifiers agreed upon by the International Nucleotide Sequence
Database Collaboration.
Currently, only nucleotide sequences are accepted for direct submission to
GenBank. These include mRNA sequences with coding regions, fragments of genomic
DNA with a single gene or multiple genes, and ribosomal RNA gene clusters. If part of the
nucleotide sequence encodes a protein, a conceptual translation, called a CDS (coding
sequence), is annotated. The span of the CDS feature is mapped to the nucleotide sequence
encoding the protein. A protein accession number (/protein_id) is assigned to the translation
product, which will subsequently be added to the protein databases. Multiple sequences can
be submitted together. Such batch submissions of non-related sequences may be processed
2 [http://www.ncbi.nlm.nih.gov/collab/FT/ index.html]
Search WWH ::




Custom Search