Biomedical Engineering Reference
In-Depth Information
ggggcuauagcucaggcgcuugcauggcaagcaagaggucagu
*******************************************
(***************************************)**
((((*********************************))))**
((((||(************)*****************))))**
((((||((((||||||))))*****************))))**
((((||((((||||||))))|||(************)))))**
((((||((((||||||))))|||(((((||||)))))))))**
((((||((((||||||))))|||(((((||||)))))))))||
Fig. 4. Illustration of the recursive process of sampling an RNA secondary structure for an ex-
emplary input sequence according to the common sampling strategy. Note that base pairs are
represented by pairs of corresponding brackets () , unpaired bases by symbols | and bases
which have not been solved yet (i.e., which have not been determined to be paired or unpaired so
far) are depicted by symbols * .
4.2
Alternative Strategy
Unfortunately, the common sampling strategy from Sect. 4.1 lacks the ability to take full
advantage of the exact inside values
W exact , obtained by employing a particular mixed preprocessing variant according to
0 W exact <n . Particularly, the strategy in general inevitably has to sample the
first base pairs from corresponding conditional probability distributions for rather large
fragments R i,j with j
α X ( i,j )= α X ( i,j ) ,for X
∈I G s
and j
i +1
i +1 > W exact , which are indeed induced by approximated
sampling probabilities rather than exact ones. Therefore, we designed an alternative to
this well-established sampling strategy that obeys to contrary principles, resulting in a
reverse sampling direction.
Basically, a complete secondary structure S 1 ,n for a given input sequence r of length
n can alternatively albeit unconventionally be sampled in the following (deliberately
less controlled) way: Start with the entire RNA sequence R start,end = R 1 ,n and ran-
domly construct adjacent substructures (paired substructures preceeded by potentially
empty single-stranded regions) of the exterior loop on the considered sequence fragment
R start,end (where the construction does not follow a particular order, e.g. does not sample
from left to right), as long as no further paired substructure can be folded. Any (paired)
substructure on fragment R start,end , 1
n , is created by sampling a
random hairpin loop (with closing base pair i.j ,for start < i < j < end )-herewe
can take advantage of exact inside values from a mixed preprocessing since most likely i
and j are close - and extending it (towards the ends of R start,end ) by successively draw-
ing closing base pairs. During this extension, basically all known substructures (stacked
pairs, bulges, interior and multibranched loops, that obey to certain restrictions which
will be discussed later) may be folded, where each substructure (e.g. multiloop) has to
be completed before its closing base pair is added and the corresponding helix can ac-
tually be further extended 4 . The process of folding a particular paired substructure ends
with a complete and valid paired structure (of the currently folding multiloop or of the
exterior loop), either with or without a directly preceeding unpaired region, both on the
start
end
4
Note that the sampling strategy proposed here is only heuristic in that it does not sample struc-
tures precisely according to the distribution implied by the underlying SCFG. See [18] for a
discussion why this is unavoidable when sampling inside-out.
 
Search WWH ::




Custom Search