Objective Functions - Multiple Sequence Alignment Methods

Biology Reference

In-Depth Information

However, if we know that S 1, i aligns to S 3, l in a third sequence S 3

and S 3, l aligns well to S 2, k , then we can choose to align S 1, i to S 2, k .

For example in the given sequences S 1 and S 2 , the “FASTCAT”

substring of S 2 can comparably be aligned to the “LASTFAT” and

“FATCAT” substrings of S 2 . The existence of a third sequence S 3

rectifies this ambiguity as follows:

S 1

GARFIELDTHE LAST FA

TCAT

S 3 :

GARFIELDTHE VERY FAST CAT

S 2

GARFIELDTHE

FAST CAT

Here, w ( S 1 , S 3 )

100. The weight of the

alignment S 1 and S 2 through S 3 is w ( S 1 , S 2 )

77 and w ( S 3 , S 2 )

min( w ( S 1 , S 3 ), w

( S 3 , S 2 ))

77 so that we update the weight of the alignment S 1

and S 2 in the primary library with a new score 77 + 88

165.

Although this is lower than the optimum pairwise alignment of S 1

and S 2 , we provide a better overall MSA.

Finally, T-Coffee produces its final MSA by using the tradi-

tional progressive alignment-based approaches on the modified

pairwise scores in the secondary library. An appealing option of

T-Coffee is that the program welcomes user-provided input

sequences for the primary library. Moreover, the latest version of

T-Coffee includes structural information for improved multiple

protein alignments [ 14 ].

MAFFT, a high speed multiple sequence alignment program,

implements Fast Fourier Transform (FFT) to identify homologous

regions quickly after converting amino acid sequences into two

feature vectors [ 15 ]. These feature vectors, which are composed

of six components in total, represent volume and polarity of amino

acid sequences [ 16 ]. The motivating idea in MAFFT is that highly

correlated sequences may have homologous regions and sequence

correlation is calculated by FFT of normalized volume and polarity

vectors, v ( a ) and p ( a ), respectively

3.3 MAFFT

Þ¼½

=σ

Þp

Þ¼½

=σ p :

Correlation between two sequences is then defined as:

Þ¼

c v

Þþ

c p

Þ;

where

Þ¼ P 1 nN; 1 nþkM ^

c v

v 1

Þ^

v 2

Þ¼ P 1 nN; 1 nþkM p 1

Þp 2

l N and M denote the length of sequences.

c p

Multiple Sequence Alignment Methods

Search WWH ::

Custom Search

Home