Biology Reference
In-Depth Information
n 2 X
n
1
CMscore
ð
X
Y
Þ¼
CMap XY ð
i
i
Þ
;
;
i
¼
1
n 2 X
n 2 X
X
n
n
n
1
1
xy ii ¼
x ij y ji
¼
i
¼
1
i
¼
1
j
¼
1
We only need to calculate the diagonal values in CMap XY in
implementation, so as to speed up the program. What's more, the
pairwise distance between sequences X and Y can be calculated as:
W 4 Optscore
ð
X
Y
Þ
;
d
ð
X
Y
Þ¼
1
W 5 CMscore
ð
X
Y
Þ
(8)
;
;
f
n 1 ;
n 2 g
min
The sum of W 4 and W 5 is 1. They are used to control the
influence of sequences X and Y .
A guide tree is constructed by the UPGMA method based on the
linear combinatorial strategy [ 18 ]. Referring to the calculation
scheme adapted in MSAProbs, the distance between a new cluster
Z formed by merging clusters X and Y , and another cluster W is:
3.3 Construction
of Guide Tree and
Transformation of
Posterior Probability
d
ð
W
X
Þ
Num
ð
X
Þþ
d
ð
W
Y
Þ
Num
ð
Y
Þ
;
;
d
ð
W
Z
Þ¼
(9)
;
ð
X
Þþ
ð
Y
Þ
Num
Num
where Num( X ) is the number of leafs in cluster X .
After constructing the guide tree, we adapted the sequence
weighting scheme in [ 4 ]. Based on the weights, we transformed
the original posterior probability in terms of the equation below, so
as to reduce the bias of sampling similar sequences:
!
X
1
wN
P 0 XY ¼
ð
w X
þ
w Y
Þ
P XY
þ
w z P XZ P ZY
Þ
(10)
Z
2
S
;
Z
X
;
Y
w X and w Y are the weight of sequences X and Y , Z represents any
sequence other than X or Y in the given sequence group, w Z is the
weight of Z , and wN is the sum of sequence weights in dataset S .
We use the guide tree to create a multiple sequence alignment by
progressively aligning two clusters of the most similar sequences
together. During the progressive alignment process, we apply a
weighted profile-profile alignment to align two clusters of
sequences according to the weighting scheme introduced in the
previous step. The posterior alignment probability matrix of two
clusters/profiles is averaged from the probability matrices of all
sequence pairs ( X , Y ), in which x and y are from two different
clusters. In the earlier stage, global profile-profile alignment is
generated based on the posterior alignment probability matrices
of the profiles in terms of the equation ( 5 ). Furthermore, we use a
randomized iterative alignment to refine the initial alignment, so as
to improve the alignment accuracy. As the iterative refinement
3.4 Combination
of Initial Progressive
Alignment and Final
Iterative Alignment
Refinement
Search WWH ::




Custom Search