Information Technology Reference
In-Depth Information
11.3.1 MSML
MoSGrid supports three chemical domains: quantum chemistry, molecular dynam-
ics, and docking. They are described in detail in subsequent sections. A multitude of
programs exist in each domain. MoSGrid exploits core functionality, that is common
in every domain by implementing a common description for these domains. At the
same time, it takes care of the unique and distinct properties of each domain. It serves
as a description language, which enables defining abstract and generic simulation
descriptions. It is used to store the simulation metadata, for example, a description
of the simulation setup or the results. The Chemical Markup Language (CML)
(Murray-Rust 1999) serves as the basis. A subset of CML was used and partly
extended to provide advanced features like enumerations. Dictionaries and con-
ventions, which were published with CML 4 (Murray-Rust 2003), are the most
important features for MoSGrid. Dictionaries de
ne controlled vocabularies and they
set the allowed terms. Thus, different terms with the same meaning are avoided. If a
scientist searches for a correct term, he/she will
find everything related to that
meaning. Conventions, on the other hand specify a de
ned structure for the CML
documents. Relations and constraints are de
ned between entries of a dictionary.
filled into the structure provided by CML 4.
This structure basically contains a header, computational requirements, a list of
simulations characteristics, and a part for saving the results of simulations. Common
dictionaries were created for each of the three domains. Thus, the dictionary forms an
abstraction layer between programs and respective domain-speci
Speci
c syntax and semantics can be
c concepts.
A speci
c dictionary has to be created for every program. Such a dictionary refer-
ences the common dictionary of the respective domain. Together the speci
c and
common dictionaries are used to translate from an abstract MSML to a program-
speci
c input using an adapter routine.
11.3.2 Parser and Adapter
MSML as a central data format allows for interoperability between applications,
jobs, workflow, and data formats. For this, format converters from and to MSML
were developed. Three types need to be supported: structure parsers to convert
chemical input formats like PDB to MSML, general parsers for converting
unstructured output to MSML, and adapters for the conversion between MSML and
speci
c formats needed as input for applications.
A structure parser was developed in MoSGrid to perform conversions between
formats containing chemical structures like PDB, SDF/MOL, and MSML. BioJava
(2014) is used to implement PDB support, and the Chemistry Development Kit
(CDK) (CDK 2014) is applied to support SDF/MOL. A reader and writer tool exists
for every supported structure format. The reader converts the content of an input
le
to MSML. The writer converts the MSML format to the target output format. Thus,
only one pair of readers and writers have to be developed for every new format.
Search WWH ::




Custom Search