Biomedical Engineering Reference
In-Depth Information
how the resources they want to use exist or what physical form they take, only
that they want to use a particular resource.
Network storage of data allows for computer virtualization. A machine
image or, in the case of Amazon, an AMI is stored in S3. A user can invoke
the image that causes it to be run as a virtual computer. Since the image can
be prebuilt with different uses in mind, it is possible to have different images
for different specialized purposes.
23.2.1
Amazon Web Services (AWS)
One of the most popular providers of cloud computing resources to the bio-
informatics community is Amazon. This is because its AWS is both easy to
use and inexpensive. AWS is composed of a number of different parts with
different functionality. Data are stored in S3 as objects that can range in size
from 1 byte to 5 Gb. Data objects are stored in the equivalent of folders called
buckets. Retrieval of each object requires a developer to assign a key and
data are secured from unauthorized access. A bucket can be located in one
of four geographical regions and the data objects in the bucket are replicated
across multiple servers within the region. This protects the data from loss due
to disk failure or local disaster in up to two data centers concurrently. For
increased data security, versioning is available which will store previous ver-
sions of fi les rather than overwriting, which allows users to roll back changes
and correct errors. For data that are also being stored locally, Amazon also
offers a lower cost alternative, reduced redundancy storage (RRS), that has
an expected data loss of 0.01% per year as compared to the 10
9 % expected
loss with S3. Currently data storage for the fi rst 50Tb of data is US $0.15/
Gb for S3, and US $0.10/Gb for RRS, and transfer into S3 is currently free.
For projects that require very large volumes of data to be uploaded, Amazon
offers a service, AWS Import/Export, which allows a user to physically ship
a disk to Amazon and it will download the data directly. In order to distribute
data Amazon offers Amazon Cloud Front, which allows users to have a Web
interface to give public access to objects in the user's S3 buckets. This could
be used to give collaborators access to data or it could be used to make
published data sets publically available. The publisher of the data only pays
for the amount of data downloaded so there are very low upfront costs to
hosting the data.
Cloud computing resources are available from Amazon through its Elastic
Compute Cloud (EC2) system. EC2 makes virtual computers of different sizes
available on demand on an hourly basis. Virtual computers are available from
small compute instances with a single 32-bit compute unit with 1.7 Gb of
memory and 160 Gb of local storage to extra large high-CPU instances with
twenty 64-bit compute units with 7 Gb of memory and 1.6Tb of local storage.
Additionally, compute clusters with 33.5 compute units and 23 Gb of memory
are also available. The user pays hourly for the amount of time that the
instance is alive. There are a number of other additional services that can be
Search WWH ::




Custom Search