Biomedical Engineering Reference
In-Depth Information
8.1 Introduction
The advent of high-throughput sequencing techniques initiated by
pyrosequencing in 2004 [1] is expected to accelerate the pace of discovery
in life sciences. Indeed, the rapidly and inexpensively produced super-
exponential amount of data (e.g. short sequence patterns referred to as
reads) from various high-throughput sequencing platforms allows the
scientifi c community to study specifi c biological problems in depth, such
as quantifi cation of alternative splicing in tissues [2, 3], human disease
[4], discovery of new fusion genes in cancer [5, 6], improvement of
genome assembly [7], and transcript identifi cation [8-11].
The common steps in many high-throughput sequencing studies
include: (1) alignment of reads directly to a reference transcriptome or
genome ('read mapping'); (2) identifi cation of expressed genes, isoforms
or binding sites; and (3) differential analysis across samples. An in-depth
review of standard steps in RNA-seq and ChIP-seq computational
pipelines is published by Pepke and colleagues [12]. It is worth pointing
out that genome-wide data, such as transcripts/genes, exons/introns,
promoter sites, sequences, multiple sequence alignments, transcription
factor binding sites, intergenic regions, repeat elements, microarray
probes (expression, SNP, CNV, etc.), sequencing data (RNA-seq, ChIP-
seq, DNA-seq, etc.), chromosomal conformations (3C-seq, 4C-seq, etc.),
and inter-chromosomal associations can easily be represented as sets of
genomic intervals (see Figure 8.1).
Given the huge volume of available data, new effi cient computational
tools are required in order to effi ciently perform analysis tasks such as
those outlined above [13]. Currently, freely available computational tools
for large-scale data analytics include Bioconductor [14], Galaxy [15],
Genomic Regions Enrichment of Annotations Tool (GREAT) [16], USCS
genome browser [17] and Integrated Genome Browser (IGB) [18]. For
the readers' convenience, we report here the fundamental aspects of each
tool. Bioconductor uses the R statistical programming framework to
provide tools for the analysis and comprehension of high-throughput
genomic data. The functional scope of Bioconductor packages includes
the analysis of DNA microarray, sequence, fl ow, and SNP data. Galaxy is
an open web-based platform for genomic research, based around reusable
analysis templates that users can manipulate and run repeatedly on
different data sets. Galaxy has been used for different types of genomic
research, for example investigations of epigenetics, chromatin profi ling,
transcriptional enhancers, and genome-environment interactions.
GREAT is available as a web application that was designed to analyze the
￿ ￿ ￿ ￿ ￿
 
Search WWH ::




Custom Search