COLLABORATIVE - BASED BIOINFORMATICS APPLICATIONS - Collaborative Computational Technologies for Biomedical Research

Biomedical Engineering Reference

In-Depth Information

low, it is feasible to generate images representing each version of the system.

This can be critical for a long-running project that collects samples and data

over the course of months or years. It is important that all of the data from

the project be analyzed in precisely the same manner. By only using a single

AMI version to carry out the analysis, the user can be sure that all of the results

are comparable.

Another advantage of using a publicly available stored AMI is that it allows

multiple groups to have access to precisely the same analysis platform, allow-

ing it to be a standard for comparison of results between the groups. Other

groups can save a copy of the AMI to their own Amazon (Simple Storage

System) S3 storage area or even download it to local storage, thereby remov-

ing all dependence on other groups. They can then make changes to their copy

of the AMI and either keep these changes private or return them for public

use, while the public version remains unchanged. Running a tool as an Amazon

virtual computer also has desirable security features. Some groups may be

reluctant to upload their data to a third-party website for analysis. This could

be because the data are of a proprietary nature or may be human patient

related. In essence, the virtual computer created from the AMI is the property

of the group that instantiated it, not the group that developed it. If the AMI

was properly created, then the group that developed it does not have any

access to the data analyzed by the AMI. Since it is possible to encrypt all data

uploaded and downloaded to the AMI and the private S3 storage area of the

user, the data being analyzed should be as secure as if the analysis was taking

place in the user's home data center.

There are other advantages to cloud-based analysis compared to analysis

carried out in local data centers. If a group required a number of different

analysis tools as part of its research data workfl ow, then it might have to set

up and maintain a separate server for each tool. This can require a signifi cant

investment of time and resources for tools that may only be used sporadically.

Similarly, the requirement for each tool may only occur sporadically, but when

needed there may be a large quantity of data to analyze. This could require

either a signifi cant time lag in obtaining the results or the use of multiple

computers to carry out the work. With local machines, these extra servers have

to be prepared and maintained in advance. With virtual computers, the number

of nodes required to carry out the work can be easily instantiated. Since billing

is done by node hours used, it costs the same to carry out an analysis for 100

hours on 1 node as for 1 hour on 100 nodes. This gives even small groups access

to large-scale computing resources on demand.

Data loss is also less likely with cloud-based storage. Locally stored data

are subject to disk failure. Usually this can be addressed using tape backup,

but for this to be effective the tape backups have to be tested and stored off-

site. Cloud-stored data are multiply replicated and stored in different geo-

graphic locations and even across multiple continents. This ensures not only

that it is protected against loss but also that timely access to the data will not

be interrupted by a single point of failure. Since users can set access policies

Collaborative Computational Technologies for Biomedical Research

Search WWH ::

Custom Search

Home