Biomedical Engineering Reference
In-Depth Information
24.3.1
Data Standardization and Interoperability
Sharing of information and data requires well-developed standards and
exchange formats as discussed in Chapter 13. An example in cheminformatics
is the extensible Chemical Markup Language (CML) [32], which is an approach
to manage primarily molecular data, which has been extended to also com-
prise other entities, including reactions and spectra. Another example is the
Human Proteome Organization-Proteomics Standards Initiative (HUPO-
PSI) molecular interaction format for the representation of molecular
interaction data [33]. The advantage of standardized fi le formats is that appli-
cations can share information without loss of data, and it is becoming increas-
ingly common that data must be deposited in public repositories in open
exchange formats prior to publication in scientifi c journals.
The cheminformatics community is slowly moving toward a more classical
standardization of knowledge: the use of ontologies (see also Chapter 12).
Ontologies are formal representations that are used to defi ne concepts and
their relationships in a specifi c domain. By explicitly defi ning what a term
means, it defi nes how it should be used. Likewise, knowledge expressed with
terms defi ned in ontologies is more precise as others can then look up what
the exact meaning is.
There are various levels of detail in an ontology and, in its most simple
form, the ontology is a controlled vocabulary. An example is the International
Union of Pure and Applied Chemistry (IUPAC) Gold Book [34], which speci-
fi es chemical terminology. More detailed ontologies, such as those used by
the knowledge management community, defi ne terms in much more detail,
identifying classes, the hierarchy of classes (e.g., used by the Gene Ontology
[35, 36]), and relationships between classes. For example, a chemical ontology
can specify what a molecule is, that it is a subclass of chemical entities, that a
molecule can have a boiling point, and that a boiling point is a physical prop-
erty of a chemical entity. These are representative of the type of facts that are
expressed in domain ontologies. Ontologies have been used in chemistry since
at least the 1980s [37] but have received renewed interest lately [38-40], pos-
sibly triggered by the open Extensible Markup Language (XML) [41] and the
Web Ontology Language (OWL) standards [42].
Ongoing community efforts to defi ne ontologies related to cheminformatics
include the OpenTox API mentioned earlier, the Blue Obelisk Descriptor
Ontology, and the Chemical Information Ontology, a cheminformatics-oriented
ontology [43]. Another ontology recently introduced to simplify building
knowledge bases is the open exchange format QSAR-ML [44], which aims at
representing data sets for quantitative structure-activity relationships
(QSARs) in an open and completely reproducible way. In QSARs, chemical
structures are described by numerical vectors (known as descriptors), and
QSAR-ML makes use of the Blue Obelisk Descriptor Ontology for uniquely
defi ning these descriptors. Also included is support for multiple, alternative
implementations of these descriptors, which could be available on the local
Search WWH ::




Custom Search