Biomedical Engineering Reference
In-Depth Information
Ta b l e 1 . Comparison of the considered sampling strategies (for an arbitrary input sequence of
length n )
Aspect
Conventional Strategy
Alternative Strategy
O ( n 3 ) for exact calculations,
O ( n 2 ) for approximate variant,
O ( n 2 ) with constant W exact 0
O ( n 3 ) for exact calculations,
O ( n 2 ) for approximate variant,
O ( n 3 ) with constant W exact 0
Preprocessing
time
Constraints
None
Constant
max hairpin ,
max bulge
and
max strand
Characteristics
and
course of
action
Inherently controlled, ordered:
- helices from left to right,
- sampling proceeds “inwards”:
construction of substructure S i,j
starts by considering R i,j and ends
by generating an unpaired region
(usually a hairpin loop)
Extensively more freedom, less restrictive:
- helices in arbitrary order,
- sampling proceeds “outwards”:
construction of new substructure on un-
folded fragment R start,end starts with ran-
dom hairpin loop which is extended to a
complete and valid (paired) substructure
S i,j on R start,end
Benefits of
sampling
direction
(Sub)structures are folded in ac-
cordance with the generation of
the corresponding (unique leftmost)
derivation (sub)tree by the underly-
ing SCFG
Takes more advantage of inside probabil-
ities for shorter fragments containing less
approximated terms and thus less inaccu-
racies (although this potential is narrowed
by the outside values for which the contrary
holds)
Function of
outside values
Not considered (do not influence
sampling distributions)
1) “Normalize” sampling probabilities
2) Ensure valid extensions
Identification
of valid
choices
Not required (all possible choices
are principally valid)
Dynamic checking required (due to depen-
dence on previously folded substructures)
O ( n 2 )
O ( n 2 ) with larger constants
Folding time
O ( n 3 ) with exact variant,
O ( n 2 ) with constant W exact 0
or in completely approximated case
O ( n 3 ) in case of exact computations or
mixed variants using W exact 0 ,
O ( n 2 )
Overall time
complexity
for MP
predictions
only in completely approximated
case
tensions (implying a surrounding base pair i.j ) may be sampled that can actually lead
to the generation of a corresponding valid helix.
We conclude this section by referring to Table 1 that summarizes the main differences
of both sampling strategies. Note that despite the significant algorithmic differences
between both sampling variants (our new strategy is computationally more complex),
there exists a fundamental difference when it comes to producing identical outcomes:
When applying our dynamic strategy, a complete secondary structure (e.g., the one
shown in Figs. 4 and 6) can generally be sampled in more than one way (due to the
higher degree of freedom), whereas with the common strategy, there is always only one
unique way for sampling a particular structure.
5
Applications
Attempting to quantify the decline in accuracy that results from approximative prepro-
cessing, we first considered the E.coli tRNA Ala sequence for probability profiling. The
Search WWH ::




Custom Search