Biomedical Engineering Reference
In-Depth Information
for the data stored in S3, it is possible to limit access to the data from a par-
ticular computer, range of Internet Protocol (IP) addresses (providing access
for multiple users from the same institution), or publicly. This last option can
be used to address the growing requirement for public access to data imposed
by publication and funding agencies. Publication in some journals is contingent
on public access to the underlying data. Many journals are not prepared to
host the data on their own systems, and for some fi elds such as proteomics,
public repositories capable of handling the data have not yet become available.
Several large bioinformatics data sets have been made available on the AWS
system such as the Annotated Human Genome Data provided from ENSEMBL
[1] and UniGene provided by the National Center for Biotechnology Infor-
mation [2] .
Another advantage of hosting data and tools in the cloud as publicly avail-
able data or AMIs is that the cloud helps maintain institutional data security.
By moving these resources off-site, it prevents them from becoming points of
attack. Since the AMI functions outside the institutional fi rewall, there is no
opportunity for access to it to be a security hole, regardless of what ports they
require.
23.2
CLOUD COMPUTING RESOURCES
The basic philosophy of cloud computing is to divorce the service (storage or
computation) from a physical resource and have it available on demand like
electricity from a wall socket. Remote network storage is storage that is kept
on remote servers and accessed using Web browsers, File Transfer Protocol
(FTP), or other clients. One of the main features that distinguishes it from
local network storage is that the physical location of the storage is often
unknown to the end user and the data are often redundantly distributed across
physical locations and sometimes even across continents. This redundant dis-
tribution of data across widely scattered resources offers protection from a
single point of failure. An individual drive failure does not cause data loss and
does not necessarily even take the data offl ine. Data security has been a
concern with many businesses using cloud computing. Unlike credit card or
social security numbers, in the case of bioinformatics applications this is not
as important as the individual pieces of data are not in themselves very subject
to abuse. In order to make the data useful, the full data set is usually required
as well as an understanding of the experiment that generated it. Data storage
structures can vary with the need of the investigator. Data can be stored in
the format of individual pieces of data such as e-mails on Hotmail, photos on
Flickr, or documents on Google Docs or as individual fi les with utilities such
as iDisk and DropBox. Amazon offers enterprise class data storage in its S3
system. Files are stored in “buckets” that are owned by individual users. The
fi les can be made public allowing collaborators to view them or kept private.
The key idea for cloud computing is that the user does not know where or
Search WWH ::




Custom Search