Biology Reference
In-Depth Information
where
b and 0 otherwise. Note that, in practice, only
a few subsequences … the likely motif instances … will make signifi-
cant contributions to the new model. The new mixture coefficient q is
obtained as the sum of posterior probabilities for all k -letter subsequences
divided by the search space:
δ
(x i j
=
b) is 1 if x i =
N
1
= Ê
ˆ
.
k
+
1
˜ Â
k
q
w
(10)
Á
N
j
=
1
Expectation maximization is a deterministic algorithm that can get
trapped in a local optimum. To have a chance of reaching the globally
optimal motif, it has to start from a good seed, which is already relatively
close to the target. Two strategies are used for this purpose. One is to trig-
ger the algorithm from a large number of random seeds; this is time-
consuming and thus relatively inefficient. A better way is to use a consensus
sequence obtained from a fast exact or heuristic word search algorithm, as
described above. The combination of a word search algorithm for the seed-
ing step and EM for refinement is probably the most effective motif
discovery strategy used nowadays, and is implemented in various forms
in many software tools. MEME, for instance, uses seeds obtained from
exhaustive pairwise comparison of k -letter subsequences, an approach
which leads to good seeds but has the unfavorable time complexity O ( N 2 )
for the seeding step.
As explained before, the mixture-model-based EM approach does
not account for the nonindependent occurrence of overlapping sub-
sequences. The HMM framework offers an explicit way of computing
posterior motif probabilities, under the constraint that motif
instances must not overlap, via the choice of an appropriate model
architecture. In all other respects, the “training” of a HMM, i.e. the
iterative optimization of the model with respect to the input
sequence set, is identical to the optimization of a mixture model.
3.2.3. Optimizing the motif annotation
Keeping the same probabilistic mixture model framework, the motif discov-
ery problem can also be formulated in such a way that the motif positions
Search WWH ::




Custom Search