Biomedical Engineering Reference
In-Depth Information
ggggcuauagcucaggcgcuugcauggcaagcaagaggucagu
*******************************************
*********(||||||)**************************
***(||((((||||||))))*****************)*****
***(||((((||||||))))*******(||||)****)*****
***(||((((||||||))))|||(((((||||))))))*****
***(||((((||||||))))|||(((((||||))))))*****
((((||((((||||||))))|||(((((||||)))))))))**
((((||((((||||||))))|||(((((||||)))))))))||
Fig. 6.
Illustration of the recursive process of sampling an RNA secondary structure for an exem-
plary input sequence according to our alternative sampling strategy
the algorithm obviously needs to dynamically determine the respective set of all
valid
choices (during the sampling process itself) before a corresponding probability distri-
bution (needed for drawing a particular random choice) can be derived.
This, however, may cause severe problems as regards the time complexity for ran-
domly sampling the next extension (or base pair). Nevertheless, in order to guarantee
that the worst-case time complexity for drawing any random choice remains in
(
n
)
,
we only need to impose a few restrictions concerning the lengths of single-stranded re-
gions in some types of loops. In detail, we have to consider a maximum allowed number
of nucleotides in unpaired regions of hairpin loops (
max
hairpin
), bulge or interior loops
(
max
bulge
), and multiloops (
max
strand
)
5
.
For example, if
X
O
∈I
G
s
generates hairpin loops, then the set of all possible hairpin
loops that can be validly folded on sequence fragment
R
start,end
is given by
pcHL(
start, end
):=
(
i,j,p
)
|
start
+min
hel
≤
i
≤
j
≤
end
−
min
hel
and
i
+min
HL
−
1
≤
j
≤
i
+max
hairpin
−
1
and
R
i−
min
hel
,j
+min
hel
not folded and
=0
.
p
=
β
X
(
i,j
)
·
α
X
(
i,j
)
(15)
Obviously,
max
hairpin
indeed ensures that
pcHL(
start, end
)
can be computed in
O
(
n
)
time.
Finally, it should be noted that this sampling strategy needs to additionally con-
sider outside probabilities, for two reasons: First, for “normalizing” the resulting sam-
pling probabilities. This is due to the fact that the different possible choices
(
i,j,p
)
usually imply substructures
S
i,j
of different lengths
j − i
+1
, such that only
p
=
· β
X
(
i,j
)
ensures that the probabilities of all possible choices are of the same
order of magnitude and hence imply a reasonable probability distribution for drawing a
random choice. Second, the outside values are required for guaranteeing that sampled
substructures can be validly extended. This means that only such hairpin loops and ex-
α
X
(
i,j
)
5
Note that these restrictions are not as severe as it may seem, since for example choosing the
constant value
30
for all three parameters can be expected to hardly have a negative impact on
the resulting sampling quality. In fact, many MFE based prediction algorithms also restrict the
lengths of particular single-stranded regions, at least for long bulge and interior loops (where
the proposed constant value of
30
is considered a common choice).
Search WWH ::
Custom Search