Biomedical Engineering Reference
In-Depth Information
trying to source chemicals for purchase or garner additional information from
a particular source.
22.2.6.3 Chemical Identifi ers Chemical identifi ers associated with chemi-
cal entities can include systematic names [generated using the International
Union of Pure and Applied Chemistry (IUPAC) or other nomenclatures],
trivial names, trade names, Chemical Abstracts registry numbers, international
registration numbers, or database identifi ers. Systematic, trivial, and trade
names can be multilingual. As a result of the various series of identifi ers which
could possibly exist, there can be tens to hundreds of identifi ers associated
with just a single chemical entity. Chemical names in public databases are often
of dubious quality and can often be ambiguous. For example, dichlorobenzene
can be consistent with a dichloro-substituted benzene moiety, but because the
positional substitution is not specifi ed, the name is ambiguous. Many online
databases are only available for text-based searching and chemical name-
based searches are therefore used regularly. Since a chemical entity can be
named in various ways, disambiguation dictionaries can lead to more complete
result sets. A similar approach is used in Wikipedia for searching. For example,
thalidomide is a well-known drug due to its well-publicized teratogenic side
effects [64]. It exists under a number of trade names, including contergan and
softenon, and searching on these names will produce the same result in
Wikipedia of displaying the thalidomide Wikipedia page. The production of
high-quality validated disambiguation dictionaries associated with the millions
of chemical entities on ChemSpider has been one of the most successful
aspects of the project and has produced a validated list of well over a million
validated identifi ers. The primary utility of these validated identifi ers is then
to use them as queries against one or more application programming inter-
faces (APIs) such as those for PubMed [65], Google Scholar [66], Google
Patents [67], and the RSC Publishing platform [68] in order to retrieve hit lists
from the queries. In this manner a single chemical record on ChemSpider will
return hits from each of the platforms based on a query set from a validated
disambiguation dictionary and, in general, provide a more complete result set
than would be obtained with any single text query [69].
The production of a validated dictionary of chemical identifi ers associated
with the ChemSpider structure set has been produced using a combination of
both robotic and manual curation. Since chemical names are introduced into
the database by the deposition of data sets from various sources and with
varying quality, it is necessary to apply ongoing fi lters to remove obvious
errors. For example, it is quite common for chemical vendors to include only
the primary component in their structure set yet leave the counterion in the
chemical name. The result will be a mismatch between the represented chemi-
cal and the associated identifi er. Many of these are easily recognized and
removed using a set of simple fi lters. These include checking for “ chloride ” in
the name and for “Cl” in the molecular formula and if there is no match
remove the identifi er. Similar approaches can be taken for many counterions
Search WWH ::




Custom Search