Biology Reference
In-Depth Information
Fig. 2 Comparison of Pro-Coffee and T-Coffee alignments of the proximal promoter region of the human gene
C18orf19. Yellow boxes indicate ChIP-seq regions for the transcription factor CEBPA. Predicted CEBPA-binding
sites are shown in green when falling in ChIP-seq regions and in red when falling outside. Pro-Coffee aligns
correctly the factor-binding regions and their binding sites while the default T-Coffee fails to do so
1. Functional DNA elements: Pro-Coffee
Aligning non-transcribed DNA is probably one of the most
challenging tasks in the field of sequence alignment as a conse-
quence of the reduced alphabet in nucleic acid sequences and
the heterogeneity of functional features contained in genomic
sequences. However, making use of footprints in homologous
promoter or enhancer regions can increase your chances in
motif finding or when scanning for known motifs using pro-
grams that accept alignments as input [ 13 , 14 ]. Pro-Coffee
[ 15 ] was designed to address this need. It makes use of a
substitution matrix between dinucleotides, where the substitu-
tion counts were estimated from the seed alignments of
TRANSFAC weight matrices. Pro-Coffee is run using the
following command:
t_coffee -seq c18orf19.fasta -mode procoffee
The accuracy of a promoter alignment (Fig. 2 ) may
increase when considering longer sequences. If the region of
interest lays 500 bp upstream of the transcription start site, it
can be beneficial to align a longer stretch, say from
3.2 Aligning DNA/
RNA Sequences
1,500 bp
to +500 bp relative to TSS, and then ignore the part of the
resulting alignment you are not interested in. You can extract a
block from, e.g., position 1,000 to position 1,500 with respect
to a reference sequence “ref” in your aligned sequences using
the command:
t_coffee -other_pg seq_reformat -in c18orf19.aln -action
+extract_block 'ref' 1000 1500
Search WWH ::




Custom Search