Biomedical Engineering Reference
In-Depth Information
genes of high versus low expression and its activating or repressive role
confi rmed.
Creating read profi les using GenomicTools is straightforward, as
demonstrated in the following example. First, the user creates the TSS
regions using as input the gene transcript chromosomal coordinates in
'genes.bed', which can be downloaded, for example, from the UCSC
Genome Browser web site. This is done using the genomic_regions tool
'pos' and 'shift' operations: the former chooses the 5' end of gene
transcripts (i.e. the TSS) and the latter performs a 10 kb fl anking operation
upstream and downstream of the TSS.
$ head genes.bed
chr1 3044313 3044814 ENSMUSG00000090025:ENSMUST00000160944 1000 +
chr1 3092096 3092206 ENSMUSG00000064842:ENSMUST00000082908 1000 +
chr1 3456667 3503634 ENSMUSG00000089699:ENSMUST00000161581 1000 +
chr1 3670235 3671871 ENSMUSG00000073742:ENSMUST00000097833 1000 +
...
$ head genes.bed
chr1 3044313 3044814 ENSMUSG00000090025:ENSMUST00000160944 1000 +
chr1 3092096 3092206 ENSMUSG00000064842:ENSMUST00000082908 1000 +
chr1 3456667 3503634 ENSMUSG00000089699:ENSMUST00000161581 1000 +
chr1 3670235 3671871 ENSMUSG00000073742:ENSMUST00000097833 1000 +
. . .
$ cat genes.bed genomic_regions pos -op 5p genomic_regions shiftp
-5p -10000 -3p +10000 > TSS.10kb.bed
$ cat genes.bed genomic_regions pos -op 5p genomic_regions shiftp
-5p -10000 -3p +10000 > TSS.10kb.bed
$ head TSS.10kb.bed
chr1 3034313 3054314 ENSMUSG00000090025:ENSMUST00000160944 1000 +
chr1 3082096 3102097 ENSMUSG00000064842:ENSMUST00000082908 1000 +
chr1 3446667 3466668 ENSMUSG00000089699:ENSMUST00000161581 1000 +
chr1 3660235 3680236 ENSMUSG00000073742:ENSMUST00000097833 1000 +
chr1 4509097 4529098 ENSMUSG00000064376:ENSMUST00000082442 1000 +
chr1 4787868 4807869 ENSMUSG00000025903:ENSMUST00000134384 1000 +
chr1 4787903 4807904 ENSMUSG00000025903:ENSMUST00000027036 1000 +
...
$ head TSS.10kb.bed
chr1 3034313 3054314 ENSMUSG00000090025:ENSMUST00000160944 1000 +
chr1 3082096 3102097 ENSMUSG00000064842:ENSMUST00000082908 1000 +
chr1 3446667 3466668 ENSMUSG00000089699:ENSMUST00000161581 1000 +
chr1 3660235 3680236 ENSMUSG00000073742:ENSMUST00000097833 1000 +
chr1 4509097 4529098 ENSMUSG00000064376:ENSMUST00000082442 1000 +
chr1 4787868 4807869 ENSMUSG00000025903:ENSMUST00000134384 1000 +
chr1 4787903 4807904 ENSMUSG00000025903:ENSMUST00000027036 1000 +
. . .
Next, the distances of the mapped ChIP-seq reads from the TSS regions
are computed using the genomic_overlaps tool 'offset' operation. The
'offset' operation allows the user to choose a reference point for the query
regions ('-op' option), and to express the computed offset as a fraction of
the query region size ('-a' option) instead of an absolute number. Also, in
this particular application, the strand information is ignored ('-i' option),
because binding occurs both sense and anti-sense of the affected transcript.
￿ ￿ ￿ ￿ ￿
$ head chipseq.bed
chr1 3001228 3001229
chr1 3001228 3001229
chr1 3001438 3001439
...
$ head chipseq.bed
chr1 3001228 3001229
chr1 3001228 3001229
chr1 3001438 3001439
. . .
$ cat chipseq.bed genomic_overlaps offset -v -i -op 5p -a TSS.10kb.bed
cut -d' ' -f1 > offset.txt
$ cat chipseq.bed genomic_overlaps offset -v -i -op 5p -a TSS.10kb.bed
cut -d' ' -f1 > offset.txt
 
Search WWH ::




Custom Search