Objective Functions - Multiple Sequence Alignment Methods

Biology Reference

In-Depth Information

Consequently, in FFT form c v ( k ) is represented as:

V 1 ð

c v ð

k

Þ,

m

Þ

V 2 ð

m

Þ

where “ ∗ ” denotes complex conjugation. MAFFT applies a slid-

ing window approach with a 30 residue window size to find out

homolog segments positions. DP is then used to align these seg-

ments optimally and gradually it joins these segments into a full

alignment. As in most of the MSA programs, MAFFT uses guided

trees and similarity matrices. Another proposed improvement of

MAFFT is to use normalized similarity matrix and gap penalties so

that all pairwise scores are positive and cost of multi-position gaps

can be computed quickly [ 15 ]. The following formula is used to fill

in the entries of the similarity matrix

M ab

S a

¼½ð

M ab

average 2 Þ=ð

average 1

average 2 Þ þ

l average 1 ¼ P a f a M aa

l average 2 ¼ P a , b f a f b M aa

l a and b denote residues, f a denotes frequency of symbol a , and

S a is a gap extension penalty.

The MAFFT algorithm is employed in two sequential steps.

The first part of phase one, which is called FFT-NS-1 ( FFT algo-

rithm and the N ormalized S imilarity matrix), involves calculating

pairwise distances, UPGMA tree construction, and progressive

alignment by using the initial guide tree. In the second part of

this first phase, FFT-NS-2, MAFFT improves on the distance

matrix and the guide tree. In the second phase, consistency-based

scoring is employed with iterative refinement. The modules G-INS-

i constructs the global alignment library of pairwise alignments,

L-INS-i uses local pairwise alignments with affine gaps to form the

library, and E-INS-i uses local alignments with a generalized affine

gap cost [ 17 ].

MUSCLE ( MU ltiple S equence C omparison by L og E xpectation) is

an efficient progressive alignment method to align large numbers of

nucleic acid and protein sequences accurately [ 18 ]. MUSCLE has

two fundamental steps; progressive and iterative refinement. First,

MUSCLE produces a temporary MSA by using k -mer distance

measures and the UPGMA clustering method. Since MUSCLE

starts alignment without any prior knowledge and the k -mer dis-

tance measure is used for unaligned sequences, in the next step

MUSCLE opts to employ the Kimura distance [ 19 ]. Namely, the

temporary MSA found in the first phase is used to assess a more

accurate distance measure [ 18 ]. Subsequently, MUSCLE uses the

3.4 MUSCLE

Multiple Sequence Alignment Methods

Search WWH ::

Custom Search

Home