Biology Reference
In-Depth Information
X
S 0 Þ¼
ð
ð
R i ;
R j Þ
SP
PS
1
i
j
n
X
X
¼
d
ð
R i ½
x
;
R j ½
x
Þ
1
i
j
n
1
x
y
X
X
S ix ;
S jx Þ
¼
d
ð
1
x
y
1
i
j
n
0
@
1
A
S 0 1 ;c
.
S 0 n;c
X
¼
1
x
y
where d :
is the scoring function, R i [ x ] (also denoted
as S ix ) is the x th symbol of the i th row in the MSA, and y is the row
length. The scoring function d can also be considered as a matrix of
predefined scores, where each cell represents the score of aligning
the two corresponding symbols.
The following example illustrates the SP method in detail:
Example: Given the following sequences:
Σ x ,
Σ y ! R
l S 1 : ACCCGA
l S 2 : ACTA
l S 3 : TCCTA
and their alignment S 0 :
<
S 1
:
ACCCGA
S 0 ð
S 1 ;
S 2 ;
S 3 Þ¼
S 2 :
AC
TA
:
S 3 :
TCC
TA
The SP score of this alignment is:
S 0 Þ¼½
SP
ð
S
ð
A
;
A
Þþ
S
ð
A
;
T
Þþ
S
ð
A
;
T
Þþ½
S
ð
C
;
C
Þþ
S
ð
C
;
C
Þþ
S
ð
C
;
C
Þ
þ½
S
ð
C
;Þþ
S
ð
C
;
C
Þþ
S
ð;
C
Þþ½
S
ð
C
;Þþ
S
ð
C
;Þþ
S
ð;Þ
þ½
S
ð
G
;
T
Þþ
S
ð
G
;
T
Þþ
S
ð
T
;
T
Þþ½
S
ð
A
;
A
Þþ
S
ð
A
;
A
Þþ
S
ð
A
;
A
Þ:
In practice, mismatch and gap penalty scores are negative values
and scoring a match between two gaps is ignored. In each step of
the alignment, the SP method calculates the scores of all pairs of
residues for every column, which increases the MSA algorithm
complexity by O ( n 2 ) where n denotes the number of sequences.
In aligning DNA/RNA sequences, the scoring schemes tend to
be more egalitarian and independent of the symbols; however,
protein sequence alignments require more sophisticated approaches
as amino acids can be divided into various functional classes based on
different similarity parameters. The two most popular score matrices
used for aligning protein sequences are the PAM and BLOSUM
matrices [ 4 ]. The motivating idea in developing scoring matrices for
Search WWH ::




Custom Search