Biomedical Engineering Reference
In-Depth Information
single genomic intervals as in BEDTools, for example it makes full use
of all 'exons' in BED entries. BEDTools operates on single-'exon' BED
entries.
Full stream-computing design: in GenomicTools no fi les are
loaded into memory but are processed instead as streams: this
minimizes memory requirements and allows the simultaneous
processing of several fi les (e.g. different replicates, patient samples,
etc.).
C++ API: GenomicTools command-line operations are implemented as
C++ class methods in a convenient API, which may be used by
developers to write new applications entirely in C++, for example
novel peak fi nders.
Auxiliary tools: GenomicTools offers a set of auxiliary command-line
tools (permutation_test, vectors, and matrix) to facilitate the
construction of command-line pipelines as they implement basic
mathematical and statistical operations on vectors and matrices.
Performance: GenomicTools improves performance over BEDTools
both in terms of time and memory requirements.
Research institutions as well industry sectors in life sciences, such as
pharmaceutical and medical research companies, that make extensive use
of high-throughput sequencing technologies, are expected to use these
tools. Naturally, these tools can also be used for different kinds of
genomics studies, and were in fact initially developed for this reason.
More specifi cally, our computational genomics group at IBM Research
has used an early version of this tool - before it was released to the
public - for computational studies of repeat elements in mammalian
genomes [21, 22], analysis of gene expression tiling array data in
Drosophila [23], and the study of dynamic changes in human DNA
methylation during differentiation [24].
The rest of this chapter presents the GenomicTools platform (version
2.0, released in September 2011) and is organized as follows. The
following section provides the necessary defi nitions as well as fundamental
information on the input fi le formats used in GenomicTools. There is
then an overview of the tools, followed by a more in-depth presentation
of some aspects of the C++ implementation. Several practical examples
are given using GenomicTools for computational genomics analyses in
the context of a simple ChIP-seq pipeline case study. Finally, a comparison
of the performance of GenomicTools against BEDTools and Bioconductor
is provided.
￿ ￿ ￿ ￿ ￿
 
Search WWH ::




Custom Search