Biology Reference
In-Depth Information
protein sequences is that the substitution of different amino acid
pairs should be treated differently. For example, the substitution of a
hydrophilic amino acid with another hydrophilic amino acid, which
can be considered as a mutual substitution, should be punished less
severely than substitution with a hydrophobic amino acid. In order
to obtain a biological justification for the penalty of various sub-
stitutions, Dayhoff and colleagues worked on 34 protein superfami-
lies divided into 71 groups of homologous proteins [ 5 ]. Within each
group, sequences were more than 85 % similar and the total number
of changes was 1,572. Based on known evolutionary trees built for
each group, a reversal on the tree represents substitution frequencies
for different amino acid pairs.
P oint A ccepted M utation (PAM)” scoring matrices are calculated
based on the substitution rates obtained by the aforementioned
tree reversals done by Dayhoff et al. Entries in the scoring matrix
represent the likelihood of replacing an amino acid X by an amino
acid Y. PAM matrices are denoted by a rate, e.g., a 1-PAM matrix
assumes the sequences to be aligned are 99 % identical, hence the
accepted point mutation rate is 1 %. The score of a given substitu-
tion is the ratio of the frequency of this substitution to the expected
mutation rate. This value is usually represented in the logarithmic
scale and a higher level PAM matrix is calculated by successive
multiplications of the 1-PAMmatrix. For example, a 3-PAMmatrix
is the 1-PAM matrix taken to the power of three. However, as one
residue may have mutated to another one and then reverted to the
original residue, or a residue may have mutated more than once, an
X -PAM matrix does not imply X % expected difference between
the sequences to be aligned. For example, the 250-PAM matrix
assumes a 20 % similarity, while the 80-PAMmatrix assumes a 50 %
similarity between the sequences to be aligned. In Table 1 , we show
the 250-PAM matrix, which is popularly used for aligning distant
sequences.
2.2 Point Accepted
Mutation
Block Substitution Matrix, shortly BLOSUM, is also designed for
scoring protein alignments [ 6 ]. The idea is similar to that of PAM,
but BLOSUM matrices use a larger amount of sequence data and
consider local alignment blocks or highly conserved regions rather
than independent residue alignments. BLOSUM matrices are cal-
culated by processing sequences with different degrees of simila-
rities. For example, the BLOSUM62 matrix is generated from
sequences that are more than 62 % identical. BLOSUM matrix
entries M ij are calculated using:
2.3 Block
Substitution Matrix
;
p ij
1
λ
M ij
¼
log
q i
q j
where p ij is the probability of observing a substitution between
amino acids i and j ; q i and q j is the probability of observing i and j ,
Search WWH ::




Custom Search