Biomedical Engineering Reference
In-Depth Information
a variety of solvents. All of the data generated as part of this project is publicly
hosted on a GoogleDocs spreadsheet and outsiders are encouraged to explore
and mine the data, with the hope that their results also will be open. At the
time of writing the project has seen contributions from a number of people,
including chemists, mathematicians, and programmers. As the project has
grown, the number of measurements now numbers in the hundreds. The
spreadsheet contains alphanumeric identifi ers for solutes and solvents along
with Simplifi ed molecular input line entry specifi cation (SMILES) representa-
tions and solubility data. In a number of cases, external references are also
included. While the use of Google spreadsheets is a very simple way to share
data, the nature of the data makes it unwieldy to explore. In general, while
numeric solubility data are useful, it is more appropriate to explore it from a
chemical point of view—that is, in terms of structures and substructures.
As a result, a simple Web page interface was developed [74] that extracts
data from the Google spreadsheet via the Google- provided data application
programming interface (API) and presents views of the data or fi ltered subsets
of the data (based on solute or solvent identifi ers, substructures, or solubility
ranges). The key feature of this application is the incorporation of chemical
intelligence by making use of cheminformatics Web services hosted at Uppsala
University. By making use of these services, the SMILES strings [75] stored in
the spreadsheet could be fi ltered by the presence or absence of substructures
(specifi ed via SMILES Arbitrary Target Specifi cation [SMARTS] [76]). In
addition, the services were also employed to provide 2D structure depictions
of the results matching satisfying the query. From a collaborative point of view,
this application is interesting as the developer had no role in the gathering of
the solubility data and did not create the online spreadsheet. Instead, the
application made use of public data APIs provided by Google and public Web
services hosted at another, remote location to extract and present data that
satisfi ed the requirements of another, external group (i.e., the experimentalists
making the measurements).
A related application was developed by another researcher to explore the
chemical space via descriptor calculations followed by principal-component
analysis [77]. This application also made use of the cheminformatics services
hosted at Drexel University as well as visualization services provided by
Google, allowing users to generate principal-component plots of the solubility
data and thereby understanding the extent of the chemical space occupied by
the current set of chemicals.
These applications highlight the fact that distributed software resources
were key in allowing multiple, unrelated parties to collaborate on a publicly
available data set. While this type of collaboration could certainly be achieved
using traditional software resources (i.e., locally installed libraries and pro-
grams), the presence of freely accessible Web services (for both cheminformat-
ics as well as data and visualization) allows arbitrary individuals or groups to
develop novel applications that were not considered by the original research-
ers. Furthermore, the free and distributed nature of the resources allows such
Search WWH ::




Custom Search