Biology Reference
In-Depth Information
based algorithms are better suited for alignment of close
sequences, for which individual events of indels can poten-
tially be traced back. In contrast, profile-based algorithms are
better for distantly related sequences, as the resultant MSA
reflects the core features that survive a saturating number of
mutational events including indels.
(4) It is also a common practice to recalculate the distance matrix
and the guide tree from the induced pairwise alignments after
construction of the initial MSA [ 31 , 40 , 52 , 53 ].
The major drawback of the progressive method is well described by
the paraphrase “once a gap, always a gap” [ 54 ], implying that if an
erroneous gap is inserted at an early stage, it is propagated to the
final result without any chance of correction. One method to
circumvent this defect is to use the consistency transformation as
already described in Subheading 2.2 . Another effective approach
relies on post process known as iterative refinement [ 55 - 58 ].
An iterative refinement consists of four steps:
(1) Construct an initial MSA by some method, e.g., by a progres-
sive method.
(2) Horizontally divide the MSA into two groups. Remove the
columns composed of nulls alone from each of the two
groups.
(3) Realign the two groups by a pairwise sequence-to-group or
group-to-group alignment method.
(4) Repeat (2) and (3) until no improvement in the alignment
score is expected or by a predefined number of times.
Obviously, a precise definition of the objective function is indis-
pensable for this procedure to work adequately. Most widely used
objective function is the sum-of-pairs score [ 59 ] or slightly more
general weighted sum-of-pairs score (WSP) [ 60 ] with affine gaps:
2.4 Iterative
Refinement
X
WSP
ð
A
Þ¼
w p;q H a p ; a q
1
p
<
q
N
X
X
;
¼
w p;q Sa p;l
;
a q;l
v
Gp
ð
;
q
;
l
Þ
(4)
1
p
<
q
N
1
l
L
where H ( a p , a q ) is the alignment score of a pair of rows in
(match-
ing nulls are ignored), w p , q is the weight given to the sequence pair
( a p , a q )( w p , q ¼
A
1 in an unweighted case), and G ( p , q , l ) is the Boolean
variable indicating if a gap opens between a p and a q at the column
position l . The heart of this method is the step (3), which should
improve (at least not worsen) the alignment score compared with
that before the division. This problem was first addressed by Gotoh
[ 58 ] who devised a dynamic programming algorithmwith a branch-
and-bound extension that exactly optimizes group-to-group
Search WWH ::




Custom Search