Information Technology Reference
In-Depth Information
requesting the stored image and the required number and type of resources.
This deployment procedure was carried out for EXP-PAC.
To set up the EXP-PAC cloud image, a complex deployment procedure is
carried out (see FigureĀ  11.8). First, an Ubuntu server AMI is selected from
the Amazon EC2 web interface and launched. Second, as this image is not
from a trusted source, steps must be taken to ensure the image has not
been compromised. Antivirus scans are performed, and the Ubuntu image
is updated to ensure there are no vulnerabilities. Next, using the Ubuntu
software repository, LAMP is installed; this software package contains the
principal components (APACHE, PHP, and mySQL) to build a viable general-
purpose web server. PHP and APACHE are configured, increasing the POST
and upload data limit to support large data upload and analysis. EXP-PAC is
then placed into the web server directory and configured to use the mySQL
database. To enable the HPC features of EXP-PAC, openMPI and bioconduc-
tor are also deployed on this server. The Amazon cloud image is then stored
in its modified form for future use.
Publication of the EXP-PAC virtual machine image to Uncinus was per-
formed through a web interface (see FigureĀ 11.9). The virtual machine publi-
cation interface allows users to specify information about the published cloud
image that is used during deployment. The attributes required to publish a
virtual machine image are the AMI ID of the cloud image, a description of
the published cloud image, the supported instance types of the image, log-in
information, the home directory, and the OS utilized by the cloud image.
11.5.2 Workflow Execution
Once software has been deployed on the cloud, users can execute exposed
applications through published interfaces. To utilize the HPC normalization
methods provided by EXP-PAC, this case study was run on four cluster
compute instances (64-bit, dual-quad core; 23 GB RAM).
Breast cancer tumor RNAseq data (GSM721140) was downloaded from
the National Center for Biotechnology Information (NCBI). These data con-
tained 44.8 million sequence fragments, which were mapped (aligned) to the
human reference genome. To be analyzed, a number of preprocessing steps
were carried out on the data. First, SAMtools (Li et al. 2009) was used to
convert the downloaded data to a human-readable format. The converted
data were imported into HTSeq (Anders 2010) (run in union mode, non-
stranded), by which sequence fragments that matched known genes were
sorted and counted. The output of HTSeq was a list of genes and the amount
of times they appeared in the tumor.
In addition to the list of expressed genes, it was necessary to identify the
amount of mutations that had occurred in each gene. A mutation score was
given to each sequence by counting the bases that differed from the reference
genome. This process resulted in the creation of two data sets, a count of present
Search WWH ::




Custom Search