Biology Reference
In-Depth Information
upstream to downstream from the TSS. For instance, Ohler
et al . 25
40 for the
identification of core promoter elements in Drosophila . A principle limita-
tion of this approach is that it ignores a motif 's specific positional distri-
bution around the TSS, which varies widely between motifs. For
instance, the eukaryotic TATA box occurs at a rather fixed distance of
about 30 bp
used sequences between relative positions
60 and
+
5 bp upstream from the TSS; conversely, the CCAAT box
occurs within a large region of about 150 bp with a maximum at
±
80
(Fig. 4). 34 Realistic objective functions for promoter motif discovery have
to account for such differences. At equal frequency, a motif predomi-
nantly occurring within a narrow distance range should be considered
more significant than one that is evenly distributed over the entire pro-
moter region considered.
Signal search analysis (SSA) is an early method that takes positional
distributions of motifs into account. It is based on the concept of a locally
overrepresented sequence motif, which leads to a reformulation of the
motif discovery problem, as will be explained below. The input to SSA is
a set of genome sequences together with a list of so-called “functional
positions”, e.g. a list of TSSs. The result is a locally overrepresented motif
Fig. 4. Positional distributions of promoter sequence motifs around a human tran-
scription start site (TSS). Shown are the distributions of the TATA and CCAAT boxes
relative to 1867 precisely mapped TSSs from the Eukaryotic Promoter Database
(EPD), release 93. 36 The plot is based on the weight matrices published in Bucher. 34
The motif frequencies were determined in overlapping windows of 20 bp for the
TATA box, and 50 bp for the CCAAT box.
Search WWH ::




Custom Search