Utilizing open source software to facilitate communication of chemistry at RSC - Open Source Software in Life Science Research

Biomedical Engineering Reference

In-Depth Information

conferences, science policy and the promotion of chemistry to the public.

The information-handling requirements of the publishing division have

always consumed the largest proportion of the available software

development resources, traditionally dedicated to enterprise systems to

develop robust and well-defi ned systems to deliver published content

to customers. Internal adoption of open source solutions was initiated

with the development of Project Prospect [1], and then extended with the

acquisition of ChemSpider [2]. ChemSpider delivered both a platform

incorporating much open source software, staff expertise in

cheminformatics, as well as new and innovative functionality. The small

but agile in-house development team have combined commercial and

free/open source software tools to develop the platforms necessary to

deliver capabilities to the user community. This topic chapter will review

the systems that have been developed in-house, what they will deliver to

the community, the challenges encountered in utilizing these tools and

how they have been extended to make them fi t-for-purpose.

3.2 Project Prospect and open ontologies

RSC began exploring the semantic markup of chemistry articles, together

with a number of other publishers in 2002, providing support for a

number of summer student projects at the Centre in Cambridge

University. This work led to an open source Experimental Data

Checker [3], which parsed the text of experimental data paragraphs

and performed validation checks on the extracted and formatted

results. This collaboration led to RSC involvement, as well as collaboration

with Nature Publishing Group [4] and the International Union of

Crystallography [5], in the SciBorg project [6]. The resulting development

of OSCAR [7] (Open Source Chemistry Analysis Routines) as a means of

marking up chemical text and linking concepts and chemicals with

other resources, was then explored and was ultimately used as the

text mining service underpinning the award-winning 'Project Prospect

[1]' (see Figure 3.1).

It was essential to develop both a fl exible and cost-effective solution

during this project. Software development was started from scratch,

using standards where possible, but still facing numerous unknowns.

Licensing a commercial product for semantic markup would have been

diffi cult to justify and also risked both infl exibility and potential

limitations in terms of rapid development. As a result, it was decided to

Search WWH ::

Custom Search

Home