Biomedical Engineering Reference
In-Depth Information
Protein Mutations
PMD
Protein Mutant Database
Gene Expressions
GEO
Gene Expression Omnibus
Amino Acid Indices
Aaindex
Amino Acid Index Database
Protein/Peptide Literature
LITDB
Literature database for proteins and
peptides
Gene Catalog
GENES
KEGG Genes Database
The nucleotide sequence databases and PubMed represent the extremes of the spectrum from
sequences of base pairs to their relevance in disease and the practice of medicine. Other online
databases, such as the protein sequence database SWISS-PROT, and the Online Mendelian
Inheritance in Man (OMIM) database—a molecular disease database that links human genes and
genetic disease—provide data that is somewhere between the two ends of the spectrum. For
example, SWISS-PROT contains sequence motifs (where a motif is a small structural element that is
recognizable in several proteins, such as the alpha helix) that are often associated with particular
functions, linking structure and function. Popular representatives of so-called alignment databases
are PROSITE and BLOCKS, for sequence motif and motif alignment data, respectively.
Public structural databases are represented by the Cambridge Structural Database for small
molecules and the Protein Data Bank (PDB) for macromolecules. The PDB, which is maintained by the
Research Collaboratory for Structural Bioinformatics (RCSB), includes publicly available 3D structures
of proteins, nucleic acids, and carbohydrates, as determined by X-ray crystallography and NMR
spectroscopy. The PDB serves as the source data for other databases, such as the Molecular Modeling
Database (MMDB), which is used to construct 3D images of the molecules involved.
In addition to the public databases, there are a rapidly increasing number of private databases
created and maintained by for-profit companies and laboratories associated with academic
institutions. For example, the LifeSeq database from Incyte Genomics, Inc. contains gene sequences
from humans, rats, and mice. Regardless of whether databases are public or private, most have
particular functions and uses in bioinformatics, and entire topics could easily be devoted to their
construction, maintenance, and use. However, because of volatility in the commercial database space
and evolving associations among academic laboratories, the specifics of particular databases will
change markedly over time. As such, it's more important for the reader to understand the general
concepts and issues that apply to all biological databases, whether they're custom, in-house systems
or public databases administered by the federal government.
For example, one characteristic of biological databases that is virtually universal is the enormity of
their contents. To the delight of the sagging post-eCommerce information technology industry, the
data-handling requirements associated with even modest biological databases often necessitate
considerable investment in hardware, software, and personnel. Consider that as of mid-2002,
GenBank, the repository of nucleotide sequences for a variety of species that forms the basis for
much bioinformatics research, contained data on over 17 billion base pairs stored in over 15 million
sequence records. Similarly, Incyte Genomics' LifeSeq commercial database contained over a
terabyte (1,000 gigabytes) of data, with a system capacity of 70 terabytes. Many companies in the
bioinformatics space have database system capacities in excess of 200 terabytes (200,000 gigabytes,
equivalent to about 310,000 CD-ROMs), in the form of multiple, refrigerator-sized racks of hard
drives. Creating archives is an inherent challenge in any database system. So is integrating
information in different formats from multiple databases. The difficulty of these tasks is accentuated
by the sheer enormity of the volume of data involved.
Given the central role databases and database technology plays in bioinformatics, at a minimum,
researchers, managers, and scientists in the field should not only become fluent in the language of
database technology, but should also understand how biomedical databases form the basis of all
bioinformatics research and development efforts. In addition, readers should appreciate that
database technology is most valuable in the biotech industry when it enables the integration of
research, development, clinical activity, manufacturing, and selling and marketing. Data take on
added value when they leave the confines of a workstation and become incorporated into shared
Search WWH ::




Custom Search