Biology Reference
In-Depth Information
1 s i ( x ),
where s i ( x ) is the contribution of feature i to the score and is computed
as s i ( x )
The score of a candidate miRNA x is defined as S ( x )
=∑
7
i
=
log 2 ( f i ( x )/ b i ( x )). f i ( x ) is an estimate of the frequency of the
value of the i th feature of candidate x in the training set of reference
miRNAs, while b i ( x ) is a similar estimate computed over a background
set of stem-loops. For the reference set of miRNAs, which was very small,
these estimates were obtained by smoothing the empirical frequency dis-
tributions. In combination with large-scale cloning, this method raised
the number of validated C. elegans miRNAs to 88, 98 while the study in
human led to an estimated upper bound of 255 human miRNA genes. 48
This latter estimate was obtained considering candidates that are con-
served up to fish.
The first method that could in principle predict miRNAs in indi-
vidual genomes was ProMir, 99 which employed a hidden Markov
model. The structures of putative miRNA precursors were predicted
using programs from the Vienna package. 100
=
Describing the predicted
hybrid formed between the 5
arms of a stem-loop in the
terminology of sequence alignments, one can then distinguish the
following states:
and the 3
match (M) — one of the possible base pairs (A-U, U-A, G-C,
C-G, G-U, U-G);
mismatch (N) — one of the remaining (not matched) base pairs;
insertion (I) — base on the 5
arm is bulged (A-, C-, G-, U-); or
deletion (D) — base on the 3
arm is bulged (-A, -C, -G, -U).
To be able to predict the precise location of the start and end of the
mature miRNA in the stem-loop, the hidden states have an additional
qualifier, namely true (inside the duplex that involves the mature
miRNA) or false (outside of this duplex): The transition probabilities
between states therefore depend on this qualifier, as well as on the type
of states (M, N, I, D). By cross-validation, the authors showed that
ProMir 99 achieves a sensitivity of 73% and a specificity of 96%. Although
this specificity seems very good, applying the algorithm on the scale of
the human genome will generate presumably thousands of false-positive
predictions. Therefore, in order to predict miRNA in the human
Search WWH ::




Custom Search