GenomicTools: an open source platform for developing highthroughput analytics in genomics - Open Source Software in Life Science Research - page 208

Biomedical Engineering Reference

In-Depth Information

For this evaluation we used sequenced reads obtained from the

DREAM project [35], more specifi cally from challenge #1 of the

DREAM6 competition. We downloaded the original FASTQ fi les (paired-

end reads) representing mRNA-seq data from human embryonic stem

cells from http://www.the-dream-project.org/challenges/dream6-

alternative-splicing-challenge .

The FASTQ fi les were aligned to the reference human genome (version

GRCh37, February 2009) using TopHat version 1.3.1 [36]. In total ~86

million reads were aligned and converted from BAM to BED format. In this

evaluation, we measured how both CPU time and memory scale with

increased input size. The task was to identify all pair-wise overlaps between

a 'test' genomic interval fi le and a 'reference' genomic interval fi le. The

former was obtained from the set of ~86 million sequenced reads using

re-sampling without replacement (re-sampling of 1, 2, 4, 8, 16, 32 and

64 million reads), and the latter contained all annotated transcript exons

from the ENSEMBL database [37], as well as all annotated repeat elements

from the UCSC Genome Browser [17], that is a total of ~6.4 million entries.

As demonstrated in Figure 8.9, GenomicTools improves greatly on time

performance (speed-up of up to ~3.8 compared to BEDTools and ~7.0

compared to the IRanges package of Bioconductor) if the inputs are sorted,

Time evaluation of the overlap operation between a

set of sequenced reads of variable size (1 through 64

million reads in logarithmic scale) and a reference set

comprising annotated exons and repeat elements

(~6.4 million entries). Using GenomicTools on sorted

input regions yields a speed-up of up to ~3.8

compared to BEDTools and ~7.0 compared to the

IRanges package of Bioconductor

Figure 8.9

Next Page

Open Source Software in Life Science Research

Search WWH ::

Custom Search

Home