Biomedical Engineering Reference
In-Depth Information
and element lookups in the formula. Other approaches include checking for
stereochemistry in the name but absences of stereochemistry in the structures
and using name-to-structure conversion tools to convert names to structures
and look for ambiguity collisions. Despite these automated approaches being
of value for assisting in the validation of millions of identifi ers, the most rigor-
ous checks, especially in terms of trade names, are from visual inspection by
users of the ChemSpider database and application of online curation tools.
ChemSpider users who wish to assist in curating the data are required to
register on the system in order to police for potential vandalism of the data.
Curators use intuitive approaches to approve and remove identifi ers using a
series of simple check boxes. Each such operation produces an e-mail into a
centralized master curator inbox for further checking by one or more master
curators who can further approve or disallow the suggested validations to the
identifi ers. A full tracking log of all such edits is maintained on the database.
Such curations are made to the database on a daily basis, and the quality of
the validated identifi er dictionary improves incrementally as a result.
As soon as names are validated, they are used afresh to query against the
integrated services associated with a chemical record so that new data will be
retrieved from Pubmed, Google patents, Google scholar, and so on. An exem-
plar of this approach would be that a particular chemical record may have
no
associated hits from Pubmed initially, but approval of one or more identifi ers
would then trigger a lookup against the appropriate Web service and imme-
diately retrieve a related hit list. There are risks with these approaches in that
different chemicals can have the same associated identifi ers and users should
be cautious and check the associated data. This case is particularly challenging
for abbreviations though procedures have been instituted to limit such issues
as best as possible. The integration to search against external resources using
identifi ers will be discussed in further detail later in this chapter.
22.2.6.4 Physicochemical Data
Physicochemical data play a defi ning role
in the activity of chemical compounds through properties such as log
P
, log
D
, and aqueous solubility, to name only a few. The pharmaceutical industry
uses such properties in their
in silico
screening approaches via the judicious
application of the Lipinski Rule of Five [70] and other such fi lters. When such
physicochemical data can be sourced as experimental data from databases,
they are captured and listed against the chemical records. Where possible links
are retained to the original sources of the data so that they can be investigated
should there be any questions regarding the validity of the data.
The majority of the ChemSpider database does not have such properties
measured and prediction algorithms are therefore used to predict them. The
list of predicted properties includes boiling point, fl ash point, log
P
, log
D
(at
two physiological pHs), number of rotatable bonds, number of proton donors,
number of proton acceptors, and other related properties. The ability to search
the entire database using such properties as fi lters has been enabled, and this
is an excellent way to narrow a particular structure set from a query when, for
Search WWH ::
Custom Search