Fast RNA Secondary Structure Prediction Using Fuzzy Stochastic Models - Biomedical Engineering Systems and Technologies

Biomedical Engineering Reference

In-Depth Information

ggggcuauagcucaggcgcuugcauggcaagcaagaggucagu

*******************************************

(***************************************)**

((((*********************************))))**

((((||(************)*****************))))**

((((||((((||||||))))*****************))))**

((((||((((||||||))))|||(************)))))**

((((||((((||||||))))|||(((((||||)))))))))**

((((||((((||||||))))|||(((((||||)))))))))||

Fig. 4. Illustration of the recursive process of sampling an RNA secondary structure for an ex-

emplary input sequence according to the common sampling strategy. Note that base pairs are

represented by pairs of corresponding brackets () , unpaired bases by symbols | and bases

which have not been solved yet (i.e., which have not been determined to be paired or unpaired so

far) are depicted by symbols * .

4.2

Alternative Strategy

Unfortunately, the common sampling strategy from Sect. 4.1 lacks the ability to take full

advantage of the exact inside values

≤

W exact , obtained by employing a particular mixed preprocessing variant according to

0 ≤ W exact <n . Particularly, the strategy in general inevitably has to sample the

first base pairs from corresponding conditional probability distributions for rather large

fragments R i,j with j

α X ( i,j )= α X ( i,j ) ,for X

∈I G s

and j

−

i +1

i +1 > W exact , which are indeed induced by approximated

sampling probabilities rather than exact ones. Therefore, we designed an alternative to

this well-established sampling strategy that obeys to contrary principles, resulting in a

reverse sampling direction.

Basically, a complete secondary structure S 1 ,n for a given input sequence r of length

n can alternatively albeit unconventionally be sampled in the following (deliberately

less controlled) way: Start with the entire RNA sequence R start,end = R 1 ,n and ran-

domly construct adjacent substructures (paired substructures preceeded by potentially

empty single-stranded regions) of the exterior loop on the considered sequence fragment

R start,end (where the construction does not follow a particular order, e.g. does not sample

from left to right), as long as no further paired substructure can be folded. Any (paired)

substructure on fragment R start,end , 1

−

n , is created by sampling a

random hairpin loop (with closing base pair i.j ,for start < i < j < end )-herewe

can take advantage of exact inside values from a mixed preprocessing since most likely i

and j are close - and extending it (towards the ends of R start,end ) by successively draw-

ing closing base pairs. During this extension, basically all known substructures (stacked

pairs, bulges, interior and multibranched loops, that obey to certain restrictions which

will be discussed later) may be folded, where each substructure (e.g. multiloop) has to

be completed before its closing base pair is added and the corresponding helix can ac-

tually be further extended 4 . The process of folding a particular paired substructure ends

with a complete and valid paired structure (of the currently folding multiloop or of the

exterior loop), either with or without a directly preceeding unpaired region, both on the

≤

start

≤

end

≤

Note that the sampling strategy proposed here is only heuristic in that it does not sample struc-

tures precisely according to the distribution implied by the underlying SCFG. See [18] for a

discussion why this is unavoidable when sampling inside-out.

Biomedical Engineering Systems and Technologies

Search WWH ::

Custom Search

Home