Biology Reference
In-Depth Information
n
2
X
n
1
CMscore
ð
X
Y
Þ¼
CMap
XY
ð
i
i
Þ
;
;
i
¼
1
n
2
X
n
2
X
X
n
n
n
1
1
xy
ii
¼
x
ij
y
ji
¼
i
¼
1
i
¼
1
j
¼
1
We only need to calculate the diagonal values in CMap
XY
in
implementation, so as to speed up the program. What's more, the
pairwise distance between sequences
X
and
Y
can be calculated as:
W
4
Optscore
ð
X
Y
Þ
;
d
ð
X
Y
Þ¼
1
W
5
CMscore
ð
X
Y
Þ
(8)
;
;
f
n
1
;
n
2
g
min
The sum of
W
4
and
W
5
is 1. They are used to control the
influence of sequences
X
and
Y
.
A guide tree is constructed by the UPGMA method based on the
linear combinatorial strategy [
18
]. Referring to the calculation
scheme adapted in MSAProbs, the distance between a new cluster
Z
formed by merging clusters
X
and
Y
, and another cluster
W
is:
3.3 Construction
of Guide Tree and
Transformation of
Posterior Probability
d
ð
W
X
Þ
Num
ð
X
Þþ
d
ð
W
Y
Þ
Num
ð
Y
Þ
;
;
d
ð
W
Z
Þ¼
(9)
;
ð
X
Þþ
ð
Y
Þ
Num
Num
where Num(
X
) is the number of leafs in cluster
X
.
After constructing the guide tree, we adapted the sequence
weighting scheme in [
4
]. Based on the weights, we transformed
the original posterior probability in terms of the equation below, so
as to reduce the bias of sampling similar sequences:
!
X
1
wN
P
0
XY
¼
ð
w
X
þ
w
Y
Þ
P
XY
þ
w
z
P
XZ
P
ZY
Þ
(10)
Z
2
S
;
Z
6¼
X
;
Y
w
X
and
w
Y
are the weight of sequences
X
and
Y
,
Z
represents any
sequence other than
X
or
Y
in the given sequence group,
w
Z
is the
weight of
Z
, and
wN
is the sum of sequence weights in dataset
S
.
We use the guide tree to create a multiple sequence alignment by
progressively aligning two clusters of the most similar sequences
together. During the progressive alignment process, we apply a
weighted profile-profile alignment to align two clusters of
sequences according to the weighting scheme introduced in the
previous step. The posterior alignment probability matrix of two
clusters/profiles is averaged from the probability matrices of all
sequence pairs (
X
,
Y
), in which
x
and
y
are from two different
clusters. In the earlier stage, global profile-profile alignment is
generated based on the posterior alignment probability matrices
of the profiles in terms of the equation (
5
). Furthermore, we use a
randomized iterative alignment to refine the initial alignment, so as
to improve the alignment accuracy. As the iterative refinement
3.4 Combination
of Initial Progressive
Alignment and Final
Iterative Alignment
Refinement