Biomedical Engineering Reference
In-Depth Information
spectra packet. For particularly diffi cult searches, such as those with no
peptide specifi city (unconstrained searches), it is advisable to use a spectra
packet size of one spectra.
23.3.2
Other Bioinformatics Tools
The J. Craig Venter Institute (JCVI) has produced an AMI preconfi gured with
many of the standard bioinformatics tools that they have titled JCVI Cloud
Bio-Linux. The instance is based on 64-bit Ubuntu Linux and contains the
Celera Assembler [9], the European Molecular Biology Open Software Suite
[10] , BLAST [11] , ClustalW [12] , Glimmer [13] , GeneSpring [14] , HMMER
[15], PHYLIP [16], and RasMol [17]. The goal of the project is to produce a
platform with which groups could use to set up and distribute bioinformatics
analysis systems and data. The hope is to overcome the diffi culties in installing
and setting up bioinformatics tools.
23.3.3
Next - Generation DNA Sequencing
One of the most signifi cant challenges of bioinformatics is the analysis of the
huge volume of data generated by next-generation DNA sequencing efforts
[18]. This process produces millions of short sequence reads which must be
aligned and merged to produce the fi nal sequence. As the rate of sequencing
has accelerated, the data storage requirements have moved from megabytes
to gigabytes to terabytes and soon to petabytes. The computational time to
process these data has similarly increased. To address this, systems using cloud
computing have been developed. One of the uses of next-generation sequenc-
ing is the mapping of genomes and identifi cation of single-nucleotide poly-
morphism (SNPs). The CloudBurst application (described below) uses AWS
MapReduce and Hadoop to generate a cluster of computers to process the
alignment of reads from next-generation sequencing instruments [18]. The
algorithm is based on aligning reads to a reference genome and then extending
the alignment by adding additional reads. This is expedited by the hosting of
Ensembl and GenBank genomic data in S3. This makes the required reference
genome data available with low latency and no cost for transfer and storage.
The Crossbow system for DNA sequence alignment and SNP discovery
developed at Johns Hopkins University uses cloud computing to align high-
throughput DNA sequencing reads and fi nd individual polymorphisms [19]. It
combines Bowtie [20] to align short reads and SoapSNP [21] to call genotypes.
It is based on MapReduce and uses Hadoop to parallelize the computational
load across multiple AWS instances. According to the developers, it can analyze
over 35 times coverage of a human genome in 3 hours for about $85 using a
40-node, 320-core cluster rented from Amazon Web Services.
A similar program, also developed at Johns Hopkins University, is Myrna
[22]. Myrna also uses Bowtie and Hadoop, but rather than assemble entire
genomes, it measures gene expression by analyzing RNA-seq data sets. Like
Search WWH ::




Custom Search