Biology Reference
In-Depth Information
geographic locations, are usually maintained locally, and are connected via high-
throughput networking. The elements of a Grid are generally heterogeneous, adding
compatibility issues to those of data transfer, integration and maintenance. A Grid
architecture requires appropriate protocols, services, application programming inter-
faces, and software development kits ( Foster et al. , 2001 ). Computational Grids, with
their attendant networks of people and instruments, are ideal for global-scale data
mining and analysis ( Craddock et al. , 2008 ).
Grid computing facilitates the use of workflows: analysis pipelines in which the
output of one analysis feeds into the input of the next. Researchers have been car-
rying out this procedure manually since the emergence of the affordable computer,
but current workflows can be completely automated. Automated workflows are built
upon Web services .
Web services are formally defined interfaces that allow computational resources
to be exposed in a standard, computationally comprehensible manner. Programs can
be “exposed” as Web services by adding “wrapper” code, adhering to these stan-
dards, to the core program code. Web services may be hosted anywhere on the planet,
and combined seamlessly into workflows—at least in theory. In practice, the use of
workflows is fraught with practical difficulties relating to issues such as availability
(the Web services upon which a workflow depends may go down without warning),
reliability (the builder of a workflow cedes control of its components to their pro-
grammers, and the resulting code may or may not perform as intended) and docu-
mentation (many Web services perform excellently as designed, but many
programmers are more focused upon the code than on documentation, making the
Web service hard to use). Despite these drawbacks, well-designed workflows can
perform tasks that would be prohibitive in terms of time and cost if carried out man-
ually. Workflows also facilitate the automated re-analysis of data, as new datasets
become available. Some applications, such as Microbase ( Flanagan et al. , 2012 ),
retain the results of previous analyses and process new data without the need to
re-analyse the previously analysed data.
Several programs exist to facilitate the construction of fully automated work-
flows; examples are Taverna ( Oinn et al. , 2004 ) and Microbase ( Flanagan et al. ,
2012 ). Workflows built using these tools can be stored and shared in repositories
such as MyExperiment 16 ( Goble et al. , 2010 )( Figure 2.17 ).
Workflows have been applied to several large-scale problems, such as under-
standing the reaction of E. coli to oxygen ( Maleki-Dizaji et al. , 2009 ); identification
of microbial habitats ( Kolluru et al. , 2011 ); analysis of structural differences in met-
abolic pathways ( Arrigo et al. , 2007 ).
A relatively recent development, which makes it possible to perform unprece-
dentedly large amounts of computational analysis, is Cloud computing. Cloud com-
puting includes “both the applications delivered as services over the Internet and the
hardware and systems software in the data centres that provide those services”
16 http://www.myexperiment.org/ .
Search WWH ::




Custom Search