Biomedical Engineering Reference
In-Depth Information
with no specifi c focus. Databases built with a specifi c focus are generally quite
small, a few hundred to thousands of compounds only, highly curated, pains-
takingly assembled, and developed with a particular class of chemists in mind.
Data aggregators and repositories are commonly much larger, tens of thou-
sands to millions of compounds, and are holders of data which are likely
heavily contaminated with numerous errors and, while easy to search, can
commonly deliver misleading results. As a result, the Internet hosts informa-
tion that is hard to fi lter, diffi cult to segregate, and at best challenging to
interpret in terms of quality. It is worth reviewing some of the databases avail-
able online prior to discussing some of the challenges, advantages, and
approaches to linking together chemistry on the Web.
22.2.1
Pub C hem
The PubChem database [3] was launched by the National Institutes of Health
in 2004 as part of a suite of databases to support its roadmap initiative [4].
PubChem archives and organizes information about the biological activities
of chemical compounds and is intended to empower the scientifi c community
to use low-molecular-weight chemical compounds in their research. PubChem
consists of three databases (PubChem Compound, PubChem Substance, and
PubChem BioAssay). As of August 2010 its content is approaching 72 million
substances and 29 million unique structures but provides biological property
information for only a fraction of these compounds, just over 450,000 in total.
PubChem Substance contains records of substances from depositors into the
system. These are publishers, chemical vendors, commercial databases, and
other sources. It provides descriptions of chemicals and links to PubMed [5],
protein three-dimensional (3D) structures, and screening results. PubChem
BioAssay contains information about bioassays using specifi c terms pertinent
to the bioassay. PubChem can be searched by alphanumeric text such as
chemical names, property ranges, or structure, substructure, or structural
similarity.
Such a source of data opens up new possibilities in regards to data mining
and extraction and the system has an important role as a central repository
for chemical vendors and content providers, enabling evaluation of commer-
cial compound libraries. This saves biomedical researchers from the work
associated with gathering and searching commercial databases, and the hit-to-
lead decision-making process in drug discovery programs can certainly benefi t
from the ongoing annotation service provided by PubChem. PubChem is an
example of collaboration between chemists and biologists as PubChem itself
is only a repository platform for data and the data themselves need to be
deposited onto the platform. These data come from national screening centers,
chemical vendors, and other databases and, by depositing the data to a central
resource pharmaceutical companies, universities and other organizations with
an interest in mining, aggregating, and linking the data can download and reuse
it. This is highly benefi cial to the efforts to link together information, but the
Search WWH ::




Custom Search