Information Technology Reference
In-Depth Information
The structures within MMDB are also linked to the NCBI Taxonomy database.
Known as the PDBeast project 7 , this effort makes it possible to find the following: (1) all
MMDB structures from a particular organism; and (2) all structures within a node of the
taxonomy tree (such as lizards or Bacillus) , by launching the Taxonomy Browser
showing the number of MMDB records in each node.
The second database within the structure resources is the Conserved Domain
Database (CDD) [5], originally based largely on Pfam and SMART, collections of
alignments that represent functional domains conserved across evolution. CDD now also
contains the alignments of the NCBI COG database along with new curated alignments
assembled at NCBI. CDD can be searched from the CDD page …….. in several ways,
including by a domain keyword search 8 . Three tools have been developed to assist in
analysis of CDD: (1) the CD-Search 9 , which uses a BLAST-based algorithm to search the
position-specific scoring matrices (PSSM) of CDD alignments; (2) the CD-Browser,
which provides a graphic display of domains of interest, along with the sequence
alignment; and (3) the Conserved Domain Architecture Retrieval Tool CDART which
searches for proteins with similar domain architectures.
All the above databases and tools are discussed in more detail in other parts of
this document, including tips on how to make the best use of them.
2. Content of the Molecular Modeling Database (MMDB)
2.1 Sources of Primary Data
To build MMDB [1], 3D structure data are retrieved from the PDB database [6]
administered by the Research Collaboratory for Structural Bioinformatics (RCSB). In all
cases, the structures in MMDB have been determined by experimental methods,
primarily X-ray crystallography and Nuclear Magnetic Resonance (NMR) spectroscopy.
Theoretical structure models are omitted. The data in each record are then checked for
agreement between the atomic coordinates and the primary sequence, and the sequence
data are then extracted from the coordinate set. The resulting association between
sequence and structure allows the record to be linked efficiently into searches and
alignment displays involving other NCBI databases.
The data are converted into ASN.1 [7], which can be parsed easily and can also
accept numerous annotations to the structure data. In contrast to a PDB record, a MMDB
record in ASN.1 contains all necessary bonding information in addition to sequence
information, allowing consistent display of the 3D structure using Cn3D. The annotations
provided in the PDB record by the submitting authors are added, along with uniformly
defined secondary structure and domain features. These features support structure-based
similarity searches using VAST. Finally, two coordinate subsets are added to the record:
one containing only backbone atoms, and one representing a single-conformer model in
cases where multiple conformations or structures were present in the PDB record. Both of
these additions further simplify viewing both an individual structure and its alignments
7 [http://www.ncbi.nlm.nih.gov/Structure/PDBEAST/pdbeast.shtml]
8 [http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml]
9 [http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi]
Search WWH ::




Custom Search