Biomedical Engineering Reference
In-Depth Information
Example of fi gure in article (Reproduced by permission
of The Royal Society of Chemistry) defi ning
compounds
Figure 3.3
labels in the manuscript with a link to its chemical structure in ChemSpider
- this is the basic aim of the ChemDraw Digester.
The most crucial part of this digestion process is to fi nd each compound
in the original ChemDraw fi le, match it up with its corresponding label,
and then convert its 2D molecular structure into the MDL MOLfi le
format [43] (with extension.mol). The conversion from ChemDraw to
mol format is required so that the fi les can be concatenated to make a
MDL SDF fi le (with extension .sdf) suitable for deposition to ChemSpider.
This SDF fi le [44] is also supplemented with article publication details in
its associated data fi elds, which are used during deposition to create links
from the new and existing compound pages in ChemSpider back to the
source RSC article. Once deposited to ChemSpider, the related IDs of
each compound can be retrieved and used to markup their names and
references in the source article with reverse links to the ChemSpider
compounds. The ChemDraw format is, unfortunately, not an open
standard and it is not straightforward to digest in order to extract and
convert the chemical structures and their associated labels. It is a binary
fi le format, and although there is good documentation [45], deciphering
it is a painstaking process and this would require considerable effort.
Fortunately, as discussed previously there is an existing routine to convert
ChemDraw fi les to SDF using the 'convert' function of OpenBabel [46].
The ChemDraw digester was written using a Visual Studio, and .NET
framework as a C# service with an ASPX/C# web front-end so that
ultimately it can be reintegrated with the main ChemSpider web site. As a
￿ ￿ ￿ ￿ ￿
 
Search WWH ::




Custom Search