Biology Reference
In-Depth Information
The true problem with DNA motif discovery is that biologists have pub-
lished consensus sequences and weight matrices based on very few
sequences for too many years, whereas computational biologists were
mostly concerned with algorithmic improvements aimed at finding the
globally optimal motif with higher probability and in shorter time. In the
future, more efforts should be spent on analyzing the limitations of motif
discovery in light of a statistical inference problem.
5. Locally Overrepresented Sequence Motifs
This last section summarizes a variant of the classical motif search prob-
lem, introduced by the author about 25 years ago for the study of
promoter sequences. This method, named signal search analysis (SSA), 33
takes into account the fact that certain classes of DNA sequences such as
promoters are experimentally defined by “positions” rather than by
“borders”. First, let us elaborate on these two concepts. A set of regulatory
genomic sequence regions defined by deletion mutations, or a set of
oligocleotides shown to bind a particular transcription factor in vitro ,
constitutes a DNA sequence set defined by borders. The sequences are
of defined length, and the biologist has good reason to believe that a
particular sequence motif is hidden anywhere within the sequences. This
is exactly the experimental scenario to which the standard formulation
of the motif discovery problem applies. On the other hand, promoters
exemplify a sequence type defined by position; they are defined by the
location of the transcription start site (TSS), which can be mapped
experimentally. Promoter motifs are supposed to occur in the vicinity of
a TSS, but there is no experimental protocol that would allow delineating
the sequence range within which they must occur. Computati-
onal biologists have to cope with this missing information problem in
some way.
5.1. Modification of the Problem Statements
A common way to proceed in promoter analysis is to define promoters
operationally as sequences extending from arbitrarily chosen distances
Search WWH ::




Custom Search