Biomedical Engineering Reference
In-Depth Information
6.3.1 Molecular Structure Databases
Molecular geometries can be determined for gas-phase molecules by microwave spectro-
scopy and by electron diffraction. In the solid state, the field of structure determination is
dominated by X-ray and neutron diffraction and very many crystal structures are known.
Nuclear magnetic resonance also has a role to play, especially for proteins. All of these
topics are well discussed in every university-level general chemistry text.
Over the years, a vast number of molecular structures have been determined and there
are several well-known structural databases. One is the Cambridge Structural Database
(CSD), which is supported by the Cambridge Crystallographic Data Centre (CCDC). The
CCDC was established in 1965 to undertake the compilation of a computerized database
containing comprehensive data for organic and metal-organic compounds studied by X-ray
and neutron diffraction. It was originally funded as part of the UK contribution to interna-
tional data compilation. According to its mission statement, the CCDC serves the scientific
community through the acquisition, evaluation, dissemination and use of the world's output
of small-molecule crystal structures.
For each entry in the CSD, three types of information are stored. First is the bibliographic
information: who reported the crystal structure, where they reported it, and so on. Next
comes the connectivity data; this is a list showing which atom is bonded to which in the
molecule. Finally the molecular geometry and the crystal structure are given. The molecular
geometry consists of Cartesian coordinates. The database can be easily reached through
the Internet, but individual records can only be accessed on a fee-paying basis.
The Brookhaven Protein Data Bank (PDB) is the single worldwide repository for the
processing and distribution of three-dimensional biological macromolecular structural data.
It is operated by the Research Collaboratory for Structural Bioinformatics. At the time of
writing, there were 49 048 structures in the databank, relating to proteins, nucleic acids,
protein-nucleic acid complexes and viruses. The databank is available free of charge.
Information can be retrieved from the main website. A four-character alphanumeric
identifier such as 1PCN represents each structure. The PDB database can be searched using
a number of techniques, all of which are described in detail at the homepage.
6.3.2 The .pdb File Format
The PDB file format (.pdb) is widely used to report and distribute molecular structure data.
A typical .pdb file for phenylanine would start with bibliographic data, then move on to the
Cartesian coordinates (expressed in angstroms and relative to an arbitrary reference frame)
and connectivity data as shown below.
HETATM 1 C
1
1.576
0.433
0.004
HETATM 2 C
2
1.301
0.777
0.643
HETATM 3 C
3
0.072
1.410
0.444
HETATM 4 C
4
0.898
0.848
0.394
HETATM 5 C
5
0.609
0.365
1.029
HETATM 6 C
6
0.617
1.004
0.834
HETATM 7 C
7
2.227
1.537
0.618
HETATM 8 C
8
3.352
0.904
0.217
Search WWH ::




Custom Search