Fast RNA Secondary Structure Prediction Using Fuzzy Stochastic Models - Biomedical Engineering Systems and Technologies

Biomedical Engineering Reference

In-Depth Information

ggggcuauagcucaggcgcuugcauggcaagcaagaggucagu

*******************************************

*********(||||||)**************************

***(||((((||||||))))*****************)*****

***(||((((||||||))))*******(||||)****)*****

***(||((((||||||))))|||(((((||||))))))*****

((((||((((||||||))))|||(((((||||)))))))))**

((((||((((||||||))))|||(((((||||)))))))))||

Fig. 6. Illustration of the recursive process of sampling an RNA secondary structure for an exem-

plary input sequence according to our alternative sampling strategy

the algorithm obviously needs to dynamically determine the respective set of all valid

choices (during the sampling process itself) before a corresponding probability distri-

bution (needed for drawing a particular random choice) can be derived.

This, however, may cause severe problems as regards the time complexity for ran-

domly sampling the next extension (or base pair). Nevertheless, in order to guarantee

that the worst-case time complexity for drawing any random choice remains in

( n ) ,

we only need to impose a few restrictions concerning the lengths of single-stranded re-

gions in some types of loops. In detail, we have to consider a maximum allowed number

of nucleotides in unpaired regions of hairpin loops ( max hairpin ), bulge or interior loops

( max bulge ), and multiloops ( max strand ) 5 .

For example, if X

∈I G s generates hairpin loops, then the set of all possible hairpin

loops that can be validly folded on sequence fragment R start,end is given by

pcHL( start, end ):= ( i,j,p )

start +min hel ≤

≤

end

−

min hel and

i +min HL −

≤

i +max hairpin −

1 and

R i− min hel ,j +min hel

not folded and

=0 .

p = β X ( i,j )

α X ( i,j )

(15)

Obviously, max hairpin indeed ensures that pcHL( start, end ) can be computed in

( n )

time.

Finally, it should be noted that this sampling strategy needs to additionally con-

sider outside probabilities, for two reasons: First, for “normalizing” the resulting sam-

pling probabilities. This is due to the fact that the different possible choices ( i,j,p )

usually imply substructures S i,j of different lengths j − i +1 , such that only p =

· β X ( i,j ) ensures that the probabilities of all possible choices are of the same

order of magnitude and hence imply a reasonable probability distribution for drawing a

random choice. Second, the outside values are required for guaranteeing that sampled

substructures can be validly extended. This means that only such hairpin loops and ex-

α X ( i,j )

Note that these restrictions are not as severe as it may seem, since for example choosing the

constant value 30 for all three parameters can be expected to hardly have a negative impact on

the resulting sampling quality. In fact, many MFE based prediction algorithms also restrict the

lengths of particular single-stranded regions, at least for long bulge and interior loops (where

the proposed constant value of 30 is considered a common choice).

Biomedical Engineering Systems and Technologies

Search WWH ::

Custom Search

Home