Biology Reference
In-Depth Information
consisting of a motif description (consensus sequence or weight matrix
+
cut-off value) plus a region of preferential occurrence defined by 5
borders relative to the functional site. The key difference to the classical
motif search problem statement is that the location of the motif relative
to the reference position is transferred from input to output. As a conse-
quence, the borders of the preferred region become targets for opti-
mization and arguments of the objective function.
SSA uses a nonprobabilistic measure of local overrepresentation as an
objective function for assessing motif quality. The computation of this
measure is illustrated in Fig. 5. Briefly, the frequency of a given motif
is determined in a series of adjacent, nonoverlapping windows of identical
size, including the preferred region of occurrence as an individual win-
dow. The total length of the analyzed sequence region is chosen ad hoc.
and 3
Motif: TATAAT / 2 mismatches
Transcription start
Seq1
A
A
A
C
A
CGGTACGA GTACCACATGAAACGACA
G
AATAAAGCAA
T
Seq2
C
TTCTGACTATAATAGACAGGGTAAA
ACCTG
Seq3
ATTGCAGCTTATAATGGTTAC
A
Seq4
ACT
G
GCGGTGATACT
G
AGCACATCAGCA GAC
G
Seq5
TCATTT
G
G
A
T
A
T
G
A
T
G
C GCCCCGCTTC
C
CGATA
Seq6
TCC GCTCGTATGTTGT
G
TGGAATTGTGAGCG
Motif frequency:
0.00
1.00
0.33
0.00
Background freq: 0.44
0.11
0.33
0.44
LOR index:
0.44
0.89
0.00
0.44
Fig. 5. Local overrepresentation — the objective function used by signal search
analysis (SSA). The example sequence set consists of the six E. coli promoter
sequences which led to the discovery of the Pribnow box motif. 1 This motif typically
occurs about 10 bp upstream of the TSS. The frequency of this motif (here, defined
as TATAAT with two mismatches allowed) is analyzed in a series of adjacent,
nonoverlapping windows of 8 bp. The motif frequency is defined as the fraction of
sequences per window that contain at least one motif instance (motifs spanning win-
dow boundaries are not counted). The background frequency for a particular win-
dow is defined as the mean of the motif frequencies in all other windows. The index
of local overrepresentation (LOR) is simply the difference between the local motif
frequency and the corresponding background frequency. In this example, the ana-
lyzed motif is highly overrepresented in the second window of the series, extending
from relative positions −13 to −6.
Search WWH ::




Custom Search