Biology Reference
In-Depth Information
Today, one also has access to large promoter sets defined by high-
throughput methods such as expressed sequence tag (EST) sequencing
of oligo-capped cDNAs 38 and CAGE. 39 However, in promoter analysis,
one still faces the problem that many motifs may only be weakly over-
represented and located at a highly variable distance from the TSS. An
enrichment of motifs in genomic sequences can be achieved in various
ways. For instance, the motif search could be targeted at specific regula-
tory motifs by restricting the input sequences to promoters of genes that
are regulated in a particular way; see, for instance, Roth et al . 40 Likewise,
genome-wide chromatin immunoprecipitation profiles for more than
200 transcription factors have been used to select yeast promoters that
are occupied in vivo by a given factor in order to define its cognate bind-
ing site motifs with six different motif discovery methods. 41 A presum-
ably very accurate binding site weight matrix has been derived from over
13 000 in vivo mapped sites for the insulator protein CTCF. 42 Another
currently very trendy approach, exemplified in Xie et al . 26 is to restrict the
motif search to sequence regions that are conserved across genomes.
Specialized motif discovery algorithms, such as PhyloGibbs, 43 can exploit
information about phylogenetic conservation contained in a multiple
sequence alignment given as input to the method.
The classical weight matrix model, which assumes that motifs are of
fixed length and that the contribution of individual bases at different
positions are additive and independent of each other, is likely to be an
oversimplification in most cases (see Benos et al . 44 for a critical discussion
of this issue). The weight array method takes nearest-neighbor depend-
encies into account by scoring overlapping dinucleotides rather than
individual bases. 45 Target motifs recognized by multimeric DNA-binding
proteins often consist of two or more short motifs separated by spacers
of slightly variable length. In fact, the Pribnow box shown in Fig. 1 is one
of two conserved motifs characteristic of the major class of E. coli pro-
moters . The classical EM algorithm for motif optimization is readily
extensible to such bipartite, variable-length motif structures. 46 Ab initio
discovery of composite motifs (also referred to as transcription regulatory
modules), including combinations of motifs that may occur in different
order and orientation, is another currently very active research direction;
see Kel et al . 47 for an example.
Search WWH ::




Custom Search