Information Technology Reference
In-Depth Information
distance of vectors, equation [2]). The optimal structural alignment can be determined by a
dynamic programming algorithm.
A roughly similar approach was used by Holm and Sander for the very popular
DALI server [42]. In the underlying method the C D atoms are characterized by vectors the
parameters of which are the elements of distance matrix. The local vectors are then
compared in terms of residue similarity scores such as
[4]
I
R
(
i
,
j
)
T
R
d
A
ij
d
B
ij
or
§
A
ij
B
ij
·
d
d
[5]
¨
©
¸
¹
*
2
(
d
)
/
D
I
E
(
i
,
j
)
T
E
e
ij
¨
¸
*
d
ij
The subscript A,B refer to residues in structure d ij are the elements of the
hexapeptide distance matrices i.e. elements of the residue vectors.
*
i d denotes the average
d and i d , T , T and D are constant. A and B, respectively. Superscript R denotes
rigid comparison [eqn. 4], E refers to an elastic comparison dampened by a negative
exponential term [eqn.5]. As can be seen, summing the residues similarity measures I or
I results in quantities related to the city block distance. Comparison of two proteins A and
B is then carried out using a distance matrix whose elements are equal to either
A
ij
of
R
I
(
i
,
j
)
or
E
I , where i and j refer to two pairs of structurally aligned residues: i(A), i(B), j(A),
and j(B). The optimization task is to find the best set of equivalences between A and B that
maximize this function and the structural alignment is obtained by an optimization
algorithm (Monte Carlo optimization) To improve convergence, various heuristics are used
to obtain a reasonable starting point.
The residue similarity score of Levitt and Gerstein [43] has the formula
(
i
,
j
)
[6]
2
S
M
/(
1
(
d
/
d
)
)
i
,
j
ij
0
where d ij is the distance between C D atoms of the two structures compared, M and d 0 are
constants. S ij values are elements of a similarity matrix from which an optimizeable
substructure similarity measure S str can be calculated by introducing gaps. The S str score is
defined as
¦
2
S
M
(
1
/(
1
(
d
/
d
)
)
N
/
2
)
[7]
str
ij
0
gap
ij
The structural alignment is carried out with a dynamic programming method such as
the Smith-Waterman algorithm. Levitt and Gerstein found that random structural
similarities determined by this method follow the same extreme value distribution as
BLAST scores (or Smith-Waterman sequence alignment scores), so the results can be
characterized in terms of P values [43].
As superposition methods are compute intensive, a number of simplified
representations have been developed. One general strategy is to represent the protein by a
set of secondary structure elements (SSEs), characterized by their position within the
polypeptide sequence and the position in 3D space and are usually represented as vectors fit
to the C D atoms. This is another kind of entity-relationship description in which SSEs are
the nodes and a variety of parameters (such as distances, angles ec) are used to describe
relationships. The rationale is that superposition of a few SSEs is less compute intensive
Search WWH ::




Custom Search