Biology Reference
In-Depth Information
3.2.4. Finding multiple motifs
In exploratory applications, such as mining promoter sequences for new
transcription regulatory motifs, one often expects to find more than one
motif. For instance, a landmark paper on Drosophila promoters 25 reported
10 ab intio discovered motifs returned in one program run by MEME.
Fortunately, there is a simple and efficient way to extend the basic algo-
rithms presented above to multiple motif discovery. The principle is to pro-
ceed iteratively by searching for one motif at a time, and by progressively
excluding motif instances found from subsequent iterations. More formally,
this means that, after each cycle, the k- letter subsequences attributed to the
newly discovered motif are removed from the search space — a process that
is commonly referred to as “masking” in the sequence analysis literature.
A theoretically more proper approach would use multi-component mixture
models for synchronous optimization of several motifs at a time by EM,
Gibbs, or a progressive local multiple alignment algorithm.
3.2.5. Estimating the significance of a newly discovered motif
The different types of probability values used as objective functions for
motif optimization do not provide an answer to the question
of whether the best motif found is significant or not, as they apply
to single motifs and thus are not corrected for multiple tests. With
consensus sequence motifs, a Bonferroni correction is sometimes applied;
see, for instance, Xie et al . 26 However, this approach is likely to yield
overly conservative P-value estimates, as consensus word frequencies are
highly dependent on each other, especially if mismatches are tolerated.
The program MEME provides significance estimates for matrix-based
motif models based on a maximum likelihood ratio test (LRT), which
takes into account the number of free parameters of the model. 16 This
approach is quite sensitive to the properties of the null model, and in
practice tends to assign low E-values to questionable motifs. A good way
to corroborate the significance of a newly found motif is to rerun
the motif discovery program with randomized or shuffled sequences as
a control, so as to get an idea of what P-values or E-values could be
Search WWH ::




Custom Search