Biomedical Engineering Reference
In-Depth Information
Fast RNA Secondary Structure Prediction
Using Fuzzy Stochastic Models
Markus E. Nebel and Anika Scheid
Department of Computer Science, University of Kaiserslautern,
P.O. Box 3049, D-67653 Kaiserslautern, Germany
{ nebel,a scheid } @cs.uni-kl.de
Abstract. Computational prediction of RNA secondary structures has been an
active area of research over the past decades and since become of great relevance
for practical applications in structural biology. To date, many popular state-of-
the-art prediction tools have the same worst-case time and space requirements of
O ( n 3 ) and
O ( n 2 ) for sequence length n , limiting their applicability for practical
purposes. Accordingly, biologists are interested in getting results faster, where a
moderate loss of accuracy would willingly be tolerated in favor of saving a signif-
icant amount of computation time. Motivated by these facts, we invented a novel
algorithm for predicting the secondary structure of RNA molecules that manages
to reduce the worst-case time complexity by a linear factor to O ( n 2 ) , while on
the other hand it is still capable of producing highly accurate results. Basically,
the presented method relies on a probabilistic statistical sampling approach which
is actually based on an appropriate stochastic context-free grammar (SCFG) :for
any given input sequence, it generates a random set of candidate structures (from
the ensemble of all feasible foldings) according to a “noisy” distribution (obtained
by heuristically approximating the inside-outside values for the input sequence),
such that finally a corresponding prediction can be efficiently derived. Notably,
this method may be employed with different sampling strategies. Therefore, we
not only consider a popular common strategy but also introduce a novel one that
is supposed to fit especially well in connection with fuzzy stochastic models. A
major advantage of the proposed prediction approach is that sampling can easily
be parallelized on modern multi-core architectures or grids. Furthermore, it can
be done in-place, that is only the best (here most probable) candidate structure(s)
generated so far need(s) to be stored and finally collected. The combination of
these two benefits immediately allows for an efficient handling of the increased
sample sizes that are often necessary to achieve competitive prediction accuracy
in connection with the noisy distribution.
1
Introduction
Over the past years, several new approaches towards the prediction of RNA secondary
structures from a single sequence have been invented which are based on generating
statistically representative and reproducible samples of the entire ensemble of feasible
Corresponding author. The research of this author is supported by Carl Zeiss Foundation,
Germany.
Search WWH ::




Custom Search