Biomedical Engineering Reference
In-Depth Information
Crossbow, it provides a graphical user interface to make constructing and
running the virtual cluster easier.
Many of the available sequence aligners are based on having a reference
genome to compare the individual short reads. In cases where the reference
genome is not available, de novo assembly must be carried out. Contrail is an
example of a de novo DNA assembly program that uses cloud computing to
merge similar small reads into larger assemblies [23]. Contrail uses Hadoop
to divide the work among multiple worker nodes and an innovative algorithm
to represent the graph structures on disk rather than in memory, allowing the
method to be scaled to larger genomes.
CloudBurst is another program for short-read DNA mapping using cloud
computing [24]. Based on the RMAP short-read program, CloudBurst also
uses MapReduce and Hadoop to create and manage parallel instances to
speed the analysis of next-generation high-throughput sequencing.
Written by the University of Maryland, Quake uses Hadoop to make error
corrections to high-throughput sequencing results by examining k - mer fre-
quencies present in the short reads [25]. By examining these frequencies, it
determines the most likely sequencing errors and how to correct them and
achieve greater accuracy.
Other next-generation sequencing programs have used a similar approach
using the Microsoft Azure cloud system to analyze next-generation sequencing
data. The Azure system takes a different approach to cloud computing. Rather
than focusing on running instances, Azure runs applications in either Web
mode or worker modes. The Web mode applications are exposed to the outside
through normal Hypertext Transfer Protocol (HTTP) methods such as REST
or Simple Object Access Protocol (SOAP) and the worker nodes communi-
cate directly with the Web application. The Azure system monitors and creates
more worker nodes as necessary to carry out the task without intervention
from the user. The virtual machine (VM) instances communicate with each
other through queues and other technologies that are part of the Windows
Azure fabric.
23.4
SUMMARY
Although many bioinformatics tools are distributed as free and open-source
software, they are often diffi cult to install and have signifi cant dependencies
such as Web servers and database systems. Often this setup and confi guration
are not well documented and can require signifi cant expertise and experimen-
tation to craft a fully functional system. By setting analysis platforms in the
cloud, they can be saved as machine images that can then be publicly shared.
This allows the creator of bioinformatics applications to distribute their work
in a form that can be used without investments in setup and architecture.
Groups can now collaborate and use the same analysis programs with far less
cost and higher levels of security than previously.
Search WWH ::




Custom Search