Biology Reference
In-Depth Information
The motif frequency is defined as the fraction of sequences in a window
that contain at least one motif occurrence. Motifs overlapping window
borders are ignored for this purpose. Local overrepresentation ( LOR )
is defined as the motif frequency within the window of preferential
occurrence minus the average motif frequency determined in all other
windows:
Â
f
ij
=-
π
i
(13)
.
LOR
f
j
j
N
-
1
Here, f j is the motif frequency in window j , and N is the total number
of windows. Note that the series of windows used to compute the
background frequency needs to be adjusted to the specific region for
which LOR is computed. The motif frequency outside the preferred
regions is called background frequency, and serves the same function
as the null model in the classical motif discovery framework. In fact,
a major strength of SSA is its usage of a realistic null model based
on natural sequences from the same genomic environment. This may
explain why the weight matrices for major eukaryotic promoter
elements, which were derived by this method almost 20 years ago, are
still in use.
5.2. Search Algorithms for Locally Overrepresented
Sequence Motifs
Two algorithms have been developed for the discovery of locally over-
represented sequence motifs, one for consensus sequence motifs 35 and
one for weight matrices. 34 The former enumerates k- letter words, possi-
bly containing free positions represented by a wildcard character and
allowing a specified number of mismatches. The search space of the
preferred region is defined by a preselected fixed window, with the
5
borders of the complete sequence range being taken into con-
sideration. Using the computing power available at the time this method
was conceived, the enumerative approach was possible up to a word
length of about 6. To provide a heuristic search strategy for longer
and 3
Search WWH ::




Custom Search