Biology Reference
In-Depth Information
Consequently, in FFT form c v ( k ) is represented as:
V 1 ð
c v ð
k
Þ,
m
Þ
V 2 ð
m
Þ
where “ ” denotes complex conjugation. MAFFT applies a slid-
ing window approach with a 30 residue window size to find out
homolog segments positions. DP is then used to align these seg-
ments optimally and gradually it joins these segments into a full
alignment. As in most of the MSA programs, MAFFT uses guided
trees and similarity matrices. Another proposed improvement of
MAFFT is to use normalized similarity matrix and gap penalties so
that all pairwise scores are positive and cost of multi-position gaps
can be computed quickly [ 15 ]. The following formula is used to fill
in the entries of the similarity matrix
M ab
S a
¼½ð
M ab
average 2 Þ=ð
average 1
average 2 Þ þ
l average 1 ¼ P a f a M aa
l average 2 ¼ P a , b f a f b M aa
l a and b denote residues, f a denotes frequency of symbol a , and
S a is a gap extension penalty.
The MAFFT algorithm is employed in two sequential steps.
The first part of phase one, which is called FFT-NS-1 ( FFT algo-
rithm and the N ormalized S imilarity matrix), involves calculating
pairwise distances, UPGMA tree construction, and progressive
alignment by using the initial guide tree. In the second part of
this first phase, FFT-NS-2, MAFFT improves on the distance
matrix and the guide tree. In the second phase, consistency-based
scoring is employed with iterative refinement. The modules G-INS-
i constructs the global alignment library of pairwise alignments,
L-INS-i uses local pairwise alignments with affine gaps to form the
library, and E-INS-i uses local alignments with a generalized affine
gap cost [ 17 ].
MUSCLE ( MU ltiple S equence C omparison by L og E xpectation) is
an efficient progressive alignment method to align large numbers of
nucleic acid and protein sequences accurately [ 18 ]. MUSCLE has
two fundamental steps; progressive and iterative refinement. First,
MUSCLE produces a temporary MSA by using k -mer distance
measures and the UPGMA clustering method. Since MUSCLE
starts alignment without any prior knowledge and the k -mer dis-
tance measure is used for unaligned sequences, in the next step
MUSCLE opts to employ the Kimura distance [ 19 ]. Namely, the
temporary MSA found in the first phase is used to assess a more
accurate distance measure [ 18 ]. Subsequently, MUSCLE uses the
3.4 MUSCLE
Search WWH ::




Custom Search