MSACompro: Improving Multiple Protein Sequence Alignment by Predicted Structural Features - Multiple Sequence Alignment Methods

Biology Reference

In-Depth Information

n 2 X

CMscore

Þ¼

CMap XY ð

;

n 2 X

xy ii ¼

x ij y ji

We only need to calculate the diagonal values in CMap XY in

implementation, so as to speed up the program. What's more, the

pairwise distance between sequences X and Y can be calculated as:

W 4 Optscore

;

Þ¼

W 5 CMscore

(8)

;

n 1 ;

n 2 g

min

The sum of W 4 and W 5 is 1. They are used to control the

influence of sequences X and Y .

A guide tree is constructed by the UPGMA method based on the

linear combinatorial strategy [ 18 ]. Referring to the calculation

scheme adapted in MSAProbs, the distance between a new cluster

Z formed by merging clusters X and Y , and another cluster W is:

3.3 Construction

of Guide Tree and

Transformation of

Posterior Probability

Num

Þþ

Num

;

Þ¼

(9)

;

Þþ

Num

where Num( X ) is the number of leafs in cluster X .

After constructing the guide tree, we adapted the sequence

weighting scheme in [ 4 ]. Based on the weights, we transformed

the original posterior probability in terms of the equation below, so

as to reduce the bias of sampling similar sequences:

P 0 XY ¼

w X

w Y

P XY

w z P XZ P ZY

(10)

;

6¼

;

w X and w Y are the weight of sequences X and Y , Z represents any

sequence other than X or Y in the given sequence group, w Z is the

weight of Z , and wN is the sum of sequence weights in dataset S .

We use the guide tree to create a multiple sequence alignment by

progressively aligning two clusters of the most similar sequences

together. During the progressive alignment process, we apply a

weighted profile-profile alignment to align two clusters of

sequences according to the weighting scheme introduced in the

previous step. The posterior alignment probability matrix of two

clusters/profiles is averaged from the probability matrices of all

sequence pairs ( X , Y ), in which x and y are from two different

clusters. In the earlier stage, global profile-profile alignment is

generated based on the posterior alignment probability matrices

of the profiles in terms of the equation ( 5 ). Furthermore, we use a

randomized iterative alignment to refine the initial alignment, so as

to improve the alignment accuracy. As the iterative refinement

3.4 Combination

of Initial Progressive

Alignment and Final

Iterative Alignment

Refinement

Multiple Sequence Alignment Methods

Search WWH ::

Custom Search

Home