Biomedical Engineering Reference
In-Depth Information
format [28]. Additionally, we propose a new simple format, the REG
format, as an attempt to distill the minimum common information from
the BED/GFF/SAM formats while allowing for the more general defi nition
of a genomic region as defi ned above. Each line in a REG fi le represents
a labeled genomic region, where the label is separated from the genomic
region via a <TAB> character. A simple REG fi le representing a set of
RNA-seq reads is shown below.
Read#1<TAB>1 + 100 149
Read#2<TAB>1 + 102 151
. . .
Read#N<TAB>Y 10001 10050
Read#1<TAB>1 + 100 149
Read#2<TAB>1 + 102 151
. . .
Read#N<TAB>Y 10001 10050
Another example is the following REG fi le carrying information on gene
exons (note that every line is a set of genomic intervals).
Gene#1<TAB>1 + 160446 161690 1 + 161314 161525
...
Gene#N<TAB>Y 279704 279708 Y 279741 279839 Y 279911 279916
Gene#1<TAB>1 + 160446 161690 1 + 161314 161525
. . .
Gene#N<TAB>Y 279704 279708 Y 279741 279839 Y 279911 279916
Note that this format is a generalization of the BED format because it
allows overlapping intervals within a given region (Gene#1 in the above
example). This is particularly useful when we need to group exons of a
set of transcript isoforms of the same gene. Additionally, it allows
intervals from different chromosomes and strands to be grouped in each
line, and this helps represent gene fusions and interchromosomal
associations.
In terms of C++ implementation, each genomic region (i.e. each line in
an input fi le) is stored as an instance of the GenomicRegion class or its
derived classes for BED, GFF, and SAM formats (see C++ API for
developers for details). The entire fi le is stored as an instance of the
GenomicRegionSet class, although not necessarily fully loaded in
memory.
￿ ￿ ￿ ￿ ￿
8.3 Tools overview
The GenomicTools platform is built on top of the genomic_intervals C++
library described in the next section. Its functions are bundled in four
command-line tools: (1) genomic_regions, for basic genomic regions
operations; (2) genomic_overlaps for comparing sets of regions and
computing offsets; (3) genomic_scans for window-based operations; and
 
Search WWH ::




Custom Search