COLLABORATIVE CHEMINFORMATICS APPLICATIONS - Collaborative Computational Technologies for Biomedical Research

Biomedical Engineering Reference

In-Depth Information

a variety of solvents. All of the data generated as part of this project is publicly

hosted on a GoogleDocs spreadsheet and outsiders are encouraged to explore

and mine the data, with the hope that their results also will be open. At the

time of writing the project has seen contributions from a number of people,

including chemists, mathematicians, and programmers. As the project has

grown, the number of measurements now numbers in the hundreds. The

spreadsheet contains alphanumeric identifi ers for solutes and solvents along

with Simplifi ed molecular input line entry specifi cation (SMILES) representa-

tions and solubility data. In a number of cases, external references are also

included. While the use of Google spreadsheets is a very simple way to share

data, the nature of the data makes it unwieldy to explore. In general, while

numeric solubility data are useful, it is more appropriate to explore it from a

chemical point of view—that is, in terms of structures and substructures.

As a result, a simple Web page interface was developed [74] that extracts

data from the Google spreadsheet via the Google- provided data application

programming interface (API) and presents views of the data or fi ltered subsets

of the data (based on solute or solvent identifi ers, substructures, or solubility

ranges). The key feature of this application is the incorporation of chemical

intelligence by making use of cheminformatics Web services hosted at Uppsala

University. By making use of these services, the SMILES strings [75] stored in

the spreadsheet could be fi ltered by the presence or absence of substructures

(specifi ed via SMILES Arbitrary Target Specifi cation [SMARTS] [76]). In

addition, the services were also employed to provide 2D structure depictions

of the results matching satisfying the query. From a collaborative point of view,

this application is interesting as the developer had no role in the gathering of

the solubility data and did not create the online spreadsheet. Instead, the

application made use of public data APIs provided by Google and public Web

services hosted at another, remote location to extract and present data that

satisfi ed the requirements of another, external group (i.e., the experimentalists

making the measurements).

A related application was developed by another researcher to explore the

chemical space via descriptor calculations followed by principal-component

analysis [77]. This application also made use of the cheminformatics services

hosted at Drexel University as well as visualization services provided by

Google, allowing users to generate principal-component plots of the solubility

data and thereby understanding the extent of the chemical space occupied by

the current set of chemicals.

These applications highlight the fact that distributed software resources

were key in allowing multiple, unrelated parties to collaborate on a publicly

available data set. While this type of collaboration could certainly be achieved

using traditional software resources (i.e., locally installed libraries and pro-

grams), the presence of freely accessible Web services (for both cheminformat-

ics as well as data and visualization) allows arbitrary individuals or groups to

develop novel applications that were not considered by the original research-

ers. Furthermore, the free and distributed nature of the resources allows such

Collaborative Computational Technologies for Biomedical Research

Search WWH ::

Custom Search

Home