Biomedical Engineering Reference
In-Depth Information
Figure 22.2
Curation interface for editing chemical identifi ers associated with struc-
ture. Chemical identifi ers can be added, deleted, and validated by any user. Master
curators have additional curation capabilities.
community directly with a request to provide crowdsourced support of the
project. A project was therefore undertaken to enable real-time curation of
the data by providing a simple-to-use interface for adding, removing, and vali-
dating chemical identifi ers associated with the chemical structures (see Fig.
22.2). In parallel with the community-based curation efforts, rules-based vali-
dation of the data was also undertaken and has resulted in the removal of
hundreds of thousands of incorrect identifi ers and the creation of a large vali-
dated name-to structure dictionary containing well over a million identifi ers.
Such a validated dictionary can be important to providing high precision for
chemical name entity extraction, as reported by Hettne et al. [59].
Following the addition of community-based curation, facilities were then
added to enable further annotation and expansion of the data. Features were
added to allow real-time deposition of single or batches of chemical structures,
transaction-based predictions of physicochemical data, and the deposition of
analytical data associated with chemical structures, discussed in further detail
below.
22.2.6.2 Data Sources Data on ChemSpider can be deposited by individu-
als or by organizations. Data sets can be limited to a single chemical compound
deposited by a user simply to “register” it and receive a ChemSpider ID, or it
can be a single compound with accompanying spectral data, a list of publica-
tions, measured experimental properties, and a set of chemical identifi ers or a
data collection (tens to millions of compounds) with links to other online
resources. ChemSpider is a fl exible host for data. All chemical compounds,
whether singletons or collections, have a series of properties extracted or
generated automatically at deposition. These include the molecular formula,
Search WWH ::




Custom Search