Biomedical Engineering Reference
In-Depth Information
trists in the fi eld of metabonomics [35] would want to search the database
using monoisotopic masses and specifi c data slices from the queries in order
to search for metabolites and mass spectrometry instrument vendors have
integrated to ChemSpider in order to query the database directly from the
instrument software [36, 37]. Alternatively, a medicinal chemist investigating
drug repurposing might want to search for chemicals that demonstrate affi nity
for binding to a particular target using in silico approaches. By layering on
predicted LASSO values [38-41] describing ligand affi nity relative to a set of
targets, chemists are able to identify potentially active ligands for further
analysis and investigation. These and other searches make ChemSpider very
fl exible in its applications.
22.2.6.1 Structure Quality Issues Following the deposition and aggrega-
tion of data from a multitude of data sources, it became obvious that one of
the side effects of such an activity was that data of various levels of quality
were being merged. The challenge is in distinguishing the quality of data in a
particular collection. However, quality is diffi cult to defi ne as in many cases it
is based on assertions, experimentally determined data points, and ultimately
the interpretation of data. A recent publication by this author discussed how
many natural product chemical structures are incorrectly elucidated using
analytical techniques and are initially reported in peer-reviewed publications
[42]. When such an analysis is expanded to the analysis of public compound
databases containing millions of chemical structures and associated data, the
issues are exponentially more complex.
This author has invested many years in examining the primary assertions
of structure - identifi er relationships in order to produce disambiguation dic-
tionaries which can be utilized for the purpose of entity extraction engines for
the purpose of text mining chemistry-related articles and patents. Chemistry
is a complex subject and the accurate representation of a chemical structure
in an electronic format can be very diffi cult, especially when these are expected
to encapsulate the bonding details of complex bonding systems such as organo-
metallics. However, focusing only on small organic molecules of interest to the
life sciences some of the most common issues identifi ed include:
1. Chemical structures that are supposed to contain stereochemistry are
commonly drawn without stereo bonds.
2. Chemical structures are drawn with inappropriate valences or with charge
imbalance due to the absence of one or more expected counterions.
3. The relationship between a chemical compound and a particular chemi-
cal identifi er is confused in a number of ways: (a) the name includes a
counterion but it is absent; (b) the name defi nes specifi c stereochemistry
but it is absent or partially present or is the opposite of the name; (c) the
chemical names or registry number(s) are simply incorrectly associated;
and many other variants.
Search WWH ::




Custom Search