CHEMSPIDER: A PLATFORM FOR CROWDSOURCED COLLABORATION TO CURATE DATA DERIVED FROM PUBLIC COMPOUND DATABASES - Collaborative Computational Technologies for Biomedical Research

Biomedical Engineering Reference

In-Depth Information

data must be treated with caution as there are no quality control processes in

place and numerous scientists have commented regarding the quality of the

data within PubChem [6-8]. Screening data are less rigorous than those in

peer-reviewed articles and contain many false positives [9]. Deposited data are

not curated, and so mistakes in structures, identifi er units, and other charac-

teristics can and do occur. The author of this chapter has frequently pointed

to the accuracy of some of the identifi ers associated with the PubChem com-

pounds [10-12], and an example will be given later in this chapter. The prob-

lems arise from the quality of submissions from the various data sources. There

are thousands of errors in the structure-identifi er associations due to this

contamination and this can lead to the retrieval of incorrect chemical struc-

tures. It is also common to have multiple representations of a single structure

due to incomplete or total lack of stereochemistry for a molecule [13].

22.2.2

Drug B ank

DrugBank [14] blends both bioinformatics and cheminformatics data and

combines detailed drug (i.e., chemical) data with comprehensive drug target

(i.e., protein) information. The database contains

2500

protein or drug target sequences that are linked to these drug entries. Each

DrugCard entry contains almost 100 data fi elds, with half of the information

being devoted to drug/chemical data and the other half devoted to drug target

or protein data. The database is fully searchable, supporting extensive text,

sequence, chemical structure, and relational query searches. DrugBank has

been used to facilitate in silico drug target discovery, drug design, drug docking

or screening, drug metabolism prediction, drug interaction prediction, and

general pharmaceutical education.

The group hosting DrugBank also hosts a series of other curated databases:

the Human Metabolome Database [15] contains detailed information about

small-molecule metabolites found in the human body and is used by scientists

working in the areas of metabolomics, clinical chemistry, and biomarker dis-

covery; FoodDB [16] is a comprehensive database providing information on

over 1900 food components, the list being taken from the U.S. Food and

Drug Administration (FDA) list of everything added to food in the United

States. The author of this chapter has reviewed the data within DrugBank,

and while efforts have been made to curate the data, there are numerous

examples of inaccurate chemical structures associated with particular com-

pounds and a distinct lack of expected stereochemistry for many of the chemi-

cal structures [13] .

>

4800 drug entries and

>

22.2.3

Sure C hem

SureChem [17] provides chemically intelligent searching of a patent database

containing millions of U.S., European, and World patents. Using extraction

heuristics to identify chemical and trade names and conversion of the extracted

Collaborative Computational Technologies for Biomedical Research

Search WWH ::

Custom Search

Home