Biomedical Engineering Reference
In-Depth Information
ggggcuauagcucaggcgcuugcauggcaagcaagaggucagu
*******************************************
*********(||||||)**************************
***(||((((||||||))))*****************)*****
***(||((((||||||))))*******(||||)****)*****
***(||((((||||||))))|||(((((||||))))))*****
***(||((((||||||))))|||(((((||||))))))*****
((((||((((||||||))))|||(((((||||)))))))))**
((((||((((||||||))))|||(((((||||)))))))))||
Fig. 6. Illustration of the recursive process of sampling an RNA secondary structure for an exem-
plary input sequence according to our alternative sampling strategy
the algorithm obviously needs to dynamically determine the respective set of all valid
choices (during the sampling process itself) before a corresponding probability distri-
bution (needed for drawing a particular random choice) can be derived.
This, however, may cause severe problems as regards the time complexity for ran-
domly sampling the next extension (or base pair). Nevertheless, in order to guarantee
that the worst-case time complexity for drawing any random choice remains in
( n ) ,
we only need to impose a few restrictions concerning the lengths of single-stranded re-
gions in some types of loops. In detail, we have to consider a maximum allowed number
of nucleotides in unpaired regions of hairpin loops ( max hairpin ), bulge or interior loops
( max bulge ), and multiloops ( max strand ) 5 .
For example, if X
O
∈I G s generates hairpin loops, then the set of all possible hairpin
loops that can be validly folded on sequence fragment R start,end is given by
pcHL( start, end ):= ( i,j,p )
|
start +min hel
i
j
end
min hel and
i +min HL
1
j
i +max hairpin
1 and
R i− min hel ,j +min hel
not folded and
=0 .
p = β X ( i,j )
·
α X ( i,j )
(15)
Obviously, max hairpin indeed ensures that pcHL( start, end ) can be computed in
O
( n )
time.
Finally, it should be noted that this sampling strategy needs to additionally con-
sider outside probabilities, for two reasons: First, for “normalizing” the resulting sam-
pling probabilities. This is due to the fact that the different possible choices ( i,j,p )
usually imply substructures S i,j of different lengths j − i +1 , such that only p =
· β X ( i,j ) ensures that the probabilities of all possible choices are of the same
order of magnitude and hence imply a reasonable probability distribution for drawing a
random choice. Second, the outside values are required for guaranteeing that sampled
substructures can be validly extended. This means that only such hairpin loops and ex-
α X ( i,j )
5
Note that these restrictions are not as severe as it may seem, since for example choosing the
constant value 30 for all three parameters can be expected to hardly have a negative impact on
the resulting sampling quality. In fact, many MFE based prediction algorithms also restrict the
lengths of particular single-stranded regions, at least for long bulge and interior loops (where
the proposed constant value of 30 is considered a common choice).
 
Search WWH ::




Custom Search