Biology Reference
In-Depth Information
However, if we know that S 1, i aligns to S 3, l in a third sequence S 3
and S 3, l aligns well to S 2, k , then we can choose to align S 1, i to S 2, k .
For example in the given sequences S 1 and S 2 , the “FASTCAT”
substring of S 2 can comparably be aligned to the “LASTFAT” and
“FATCAT” substrings of S 2 . The existence of a third sequence S 3
rectifies this ambiguity as follows:
S 1
:
GARFIELDTHE LAST FA
TCAT
S 3 :
GARFIELDTHE VERY FAST CAT
S 2
:
GARFIELDTHE
FAST CAT
Here, w ( S 1 , S 3 )
100. The weight of the
alignment S 1 and S 2 through S 3 is w ( S 1 , S 2 )
¼
77 and w ( S 3 , S 2 )
¼
¼
min( w ( S 1 , S 3 ), w
( S 3 , S 2 ))
77 so that we update the weight of the alignment S 1
and S 2 in the primary library with a new score 77 + 88
¼
165.
Although this is lower than the optimum pairwise alignment of S 1
and S 2 , we provide a better overall MSA.
Finally, T-Coffee produces its final MSA by using the tradi-
tional progressive alignment-based approaches on the modified
pairwise scores in the secondary library. An appealing option of
T-Coffee is that the program welcomes user-provided input
sequences for the primary library. Moreover, the latest version of
T-Coffee includes structural information for improved multiple
protein alignments [ 14 ].
¼
MAFFT, a high speed multiple sequence alignment program,
implements Fast Fourier Transform (FFT) to identify homologous
regions quickly after converting amino acid sequences into two
feature vectors [ 15 ]. These feature vectors, which are composed
of six components in total, represent volume and polarity of amino
acid sequences [ 16 ]. The motivating idea in MAFFT is that highly
correlated sequences may have homologous regions and sequence
correlation is calculated by FFT of normalized volume and polarity
vectors, v ( a ) and p ( a ), respectively
3.3 MAFFT
^
v
ð
a
Þ¼½
v
ð
a
Þ
v
v
p
Þp
ð
a
Þ¼½
p
ð
a
p :
Correlation between two sequences is then defined as:
c
ð
k
Þ¼
c v
ð
k
Þþ
c p
ð
k
Þ;
where
Þ¼ P 1 nN; 1 nþkM ^
c v
ð
k
v 1
ð
n
Þ^
v 2
ð
n
þ
k
Þ
l
Þ¼ P 1 nN; 1 nþkM p 1
Þp 2
Þ
l N and M denote the length of sequences.
c p
ð
k
ð
n
ð
n
þ
k
l
Search WWH ::




Custom Search