Biology Reference
In-Depth Information
An MSA of S is defined as a set S 0 of sequences s 1 ;
s 0 n whose
lengths are m , defined in the same alphabet with an additional gap
symbol. In other words, gaps are inserted in the sequences s 1 ;
s 2 ; ...;
s n
so that their lengths become the same and a matrix representation
where rows represent sequences and columns represent aligned sym-
bols depicts the multiple alignment. The goal is to maximize a function
s : S 0
s 2 ; ...;
that returns the score of probable alignments. Our aim is
formally defined as follows:
a ¼
! R
argmax
a
s
ð
a
ð
S
ÞÞ;
S 0
2
where a ( S ) denotes an alignment of a given set of sequences S .
Each column of an alignment must contain at least one symbol of
which implies that there cannot be any columns composed of only
gap characters. Finding a certain optimal alignment is a computation-
ally intensive task and dynamic programming methodology, which
guarantees the optimal alignment is not directly applicable to MSA as
in the case of pairwise alignment. The reason for this is the vast number
of probable alignments, which is exponentially related to the number
of given sequences and length of sequences. Indeed, assuming a simple
scheme to calculate the score of each column is O ( n ), the complexity of
an MSA is O ( n 2 n L n ) where L is max
Σ
ð
j ;
s 1
j ; ...;
s 2
jjÞ
s n
, n is the number
of sequences, and s i
denotes the length of s i [ 1 ]. Therefore, heuristic
methods are employed to the MSA problem in order to perform faster;
however, these methods do not guarantee the optimal alignment.
jj
2
Scoring a Multiple Sequence Alignment
Cost of an alignment must be defined to distinguish the optimal
multiple alignment from other possible alignments. Basically, we
need an assessment, which is carried out using an objective function
to score an MSA [ 2 ].
Sum of Pairs (SP) [ 3 ] is one of the most popular objective function
measures used in multiple sequence alignment. The SP score of an
MSA involving n sequences is calculated by adding scores of all
n
2
2.1
Sum of Pairs
! induced pairwise alignments. The SP score is defined as
follows:
X
S 0 Þ¼
SP
ð
PS
ð
R i
;
R j
Þ;
1
i
j
n
where n denotes the number of sequences, PS is a pairwise scoring
function, and R denotes a row of an alignment.
Since pairwise scoring is column based, SP scoring can also be
viewed as such, where each column is considered independently:
Search WWH ::




Custom Search