Biomedical Engineering Reference
In-Depth Information
patient data records, or if demand on our Chem2Bio2RDF server
increases dramatically, we may not be able to scale searching to meet the
needs. There is thus a likely future need to permit searches to be
intelligently distributed in parallel, perhaps using cloud technologies.
18.3.3 How to organize the data
We decided to organize our data sets into six categories based on the kinds
of biological and chemical concepts they contain. These categories are:
chemical & drug (drug is a subclass of chemical), protein & gene,
chemogenomics (i.e. relating compounds to genes, through interaction
with proteins or changes in expression levels), systems (i.e. PPI and
pathway), phenotype (i.e. disease and side effect), and literature. However,
we did not initially develop an OWL ontology, instead depending on
'same-as' relationships between data sets (e.g. PubChem Compound X is
the same as Drugbank Drug Y). This decision was made due to the diffi culty
in defi ning a scope for an ontology before we had a good idea how
Chem2Bio2RDF was to be used. We subsequently developed a set of use-
cases that allowed us to describe a constrained, implementable ontology
for Chem2Bio2RDF. As we already had access to some other ontologies
(such as Gene Ontology), this really boiled down to a chemogenomic
ontology for describing the relationship between compounds and biological
entities. This was aligned with other related ontologies, submitted to
NCBO BioPortal, and will be described in an upcoming publication.
18.3.4 Data quality and equivalence
￿ ￿ ￿ ￿ ￿
Addressing quality is fraught with numerous complexities in details - for
example is a PubChem BioAssay IC 50 result comparable with one in
CheMBL or from an internal assay? Is an experimental result always
more signifi cant than a predicted result or an association extracted from
a journal article? What happens when we get so many links between
things that we cannot separate the signal from the noise? We are clearly
constrained by the inherent quality of the data sources available. For
Chem2Bio2RDF we decided on two principles: (1) we would not
constrain users from making their own quality decisions (e.g. by excluding
or including data sets or data types); and (2) we would not make
judgments about equivalence beyond the very basic (two compounds
equivalent in two data sets, etc.). Thus we pushed addressing primarily to
 
Search WWH ::




Custom Search