Biomedical Engineering Reference
In-Depth Information
added to EC2. Elastic Block Storage is a persistent data store that can be
attached to an instance and read from and written to much like an external
hard drive on a physical computer making data transfer easier than to S3. Like
S3, EC2 instances can be located in different geographical locations to decrease
network latency. When running, instances can be accessed by using a network
address provided when the instance is launched. Additionally, an IP address
can be assigned to an instance and an instance can include a Web server to
allow for Web access to programs running on the instance. For additional
security, a virtual private cloud using virtual private network (VPN) technol-
ogy can be created. This allows institutions to connect to the AWS cloud as
though it was part of the institution's network. In order to monitor and control
running instances, Amazon has the Cloudwatch monitoring service, elastic
load balancing, and autoscaling.
An important computational resource for bioinformatics is the Amazon
Elastic MapReduce service. MapReduce is built on the Hadoop framework.
Hadoop is a system that creates a compute cluster from a collection of virtual
instances. It supports data-intensive distributed applications by creating a
distributed fi le system that allows individual nodes to share data and job
tracker and task tracker functions that oversee the analysis of the data by the
individual instances. The MapReduce service takes problems that can be
broken down to smaller elements and automates their analysis. These so-called
embarrassingly parallel problems are characterized by having data elements
that can be analyzed independently from the entire data set. A good example
from proteomics is the peptide identifi cation from mass spectra. A mass spec-
troscopy run can be broken down into individual spectra. Each of the spectra
can be compared to the peptide sequence database to fi nd the best match in
the database, and the results from the individual searches can be combined to
produce the fi nal search results. MapReduce automates the splitting of the
data, the “map” function, the establishment and oversight of the worker
Hadoop cluster instances, and the combination of the results produced by the
individual workers, the “ reduce ” function.
There are other AWS services available that can be used in concert with
EC2 and S3. These include message management services such as Amazon
Simple Queue Service (Amazon SQS), which allows instances to exchange
messages and coordinate the parallel analysis of data, and Amazon Simple
Notifi cation Service (Amazon SNS), which allows running instances to send
messages to other instances, servers, or end users that subscribe to the mes-
sages from the instances. This allows workfl ows composed of AWS instances
to respond to events. Additionally AWS offers two database services. Amazon
SimpleDB is a simple nonrelational database that provides easy access to data
with a high degree of availability and scalability. For more demanding needs,
AWS also offers a relational database service, Amazon Relational Database
Service (Amazon RDS), which provides a cloud-based relational database
equivalent to MySQL and is compatible with applications that use MySQL.
Search WWH ::




Custom Search