Heuristic Alignment Methods - Multiple Sequence Alignment Methods

Biology Reference

In-Depth Information

based algorithms are better suited for alignment of close

sequences, for which individual events of indels can poten-

tially be traced back. In contrast, profile-based algorithms are

better for distantly related sequences, as the resultant MSA

reflects the core features that survive a saturating number of

mutational events including indels.

(4) It is also a common practice to recalculate the distance matrix

and the guide tree from the induced pairwise alignments after

construction of the initial MSA [ 31 , 40 , 52 , 53 ].

The major drawback of the progressive method is well described by

the paraphrase “once a gap, always a gap” [ 54 ], implying that if an

erroneous gap is inserted at an early stage, it is propagated to the

final result without any chance of correction. One method to

circumvent this defect is to use the consistency transformation as

already described in Subheading 2.2 . Another effective approach

relies on post process known as iterative refinement [ 55 - 58 ].

An iterative refinement consists of four steps:

(1) Construct an initial MSA by some method, e.g., by a progres-

sive method.

(2) Horizontally divide the MSA into two groups. Remove the

columns composed of nulls alone from each of the two

groups.

(3) Realign the two groups by a pairwise sequence-to-group or

group-to-group alignment method.

(4) Repeat (2) and (3) until no improvement in the alignment

score is expected or by a predefined number of times.

Obviously, a precise definition of the objective function is indis-

pensable for this procedure to work adequately. Most widely used

objective function is the sum-of-pairs score [ 59 ] or slightly more

general weighted sum-of-pairs score (WSP) [ 60 ] with affine gaps:

2.4 Iterative

Refinement

X

WSP

ð

A

Þ¼

w p;q H a p ; a q

1

p

<

q

N

X

;

¼

w p;q Sa p;l

;

a q;l

v

Gp

ð

;

q

;

l

Þ

(4)

1

p

<

q

N

1

l

L

where H ( a p , a q ) is the alignment score of a pair of rows in

(match-

ing nulls are ignored), w p , q is the weight given to the sequence pair

( a p , a q )( w p , q ¼

A

1 in an unweighted case), and G ( p , q , l ) is the Boolean

variable indicating if a gap opens between a p and a q at the column

position l . The heart of this method is the step (3), which should

improve (at least not worsen) the alignment score compared with

that before the division. This problem was first addressed by Gotoh

[ 58 ] who devised a dynamic programming algorithmwith a branch-

and-bound extension that exactly optimizes group-to-group

Multiple Sequence Alignment Methods

Search WWH ::

Custom Search

Home