Information Technology Reference
In-Depth Information
where d is the distance between each of the N pairs of equivalent atoms in two optimally
superposed structures. For the calculation of rmsd a range of alignment has to be defined
within which the matching of atoms (establishment or equivalent atoms within the two
structures) is determined which is a computationally much harder problem than the
alignment of sequences in one dimension. Once the equivalence of atoms is established, the
optimal superposition has to be found which is carried out by such straightforward
algorithms as that of Kabsch [26].
If the equivalences are fixed, then rmsd can be considered as a simple distance that
can be computed with a straightforward algorithm. This is the case for instance when one
compares different conformations of the same protein such as produced by NMR methods.
In this case the equivalences of the atoms are a priori known, since each conformation
consists of the same atoms. The rmsd is 0 for identical structures (identical conformations)
while its value increases as the two structures become more divergent. In fact rmsd values
are considered as reliable indicators structural variability when applied to very similar
proteins (say rmsd < 5-6 A). But even in this case, the rmsd value obviously depends on the
number of residues N included in the structural alignment. A statistical analysis of a large
number of structures showed that the dependence can be described as:
N
[7]
rmsd
rmsd
(
ln
)
100
100
where rmsd 100 is a constant, an rmsd value standardized to 100 residues [27]. The rmsd
values also depend on the crystallographic resolution, which is more difficult to take into
consideration (Carugo, 2002). As a result, rmsd does not behave as a metric distance for
divergent structures so it cannot be used in itself for automated clustering. Clearly, an rmsd
value of, say 3 Å has a different significance for proteins of 500 residues and for those of
50 residues, so e.g. the structural variability of fold types can not be easily compared
( rmsd 100 on the other hand may be useful for such comparisons[27]). In other terms, rmsd is
a good indicator for structural identity, but less so for structural divergence.
The algorithms for calculating rmsd are beyond our scope, the reader is referred to
recent reviews [28]. The philosophy of the calculation depends on whether or not the
alignment, i.e. the equivalences between residues (represented as C D atoms) are known. If
yes, the very popular algorithm of [26] and McLachlan (1978) can be used. If this is not the
case, and when the two 3-D models that are compared are too different, there are two
alternatives. Either a partial alignment is available or no a priori assumptions can be made.
In the first case, few equivalences between atom pairs are assumed and they are extended
(and some time rejected) through dynamic programming techniques [29]. In the other case
an exhaustive search is performed by rotating and translating a 3-D model over the other in
a six-dimensional way Diedrichs, 1995).
It has to be noted that superposition of divergent protein 3-D structures is often a
quite arbitrary exercise and various superposition algorithms may lead to completely
different results. An effective, recently proposed procedure to reconcile different structural
alignment procedures consists in an iterative reduction of the number of aligned C D atom
pairs [30]. After each superposition, the worse pair is eliminated and a new superposition is
performed leading, eventually, to the identification of the protein core that shows a
significant degree of similarity.
Finally we mention that the rmsd distance does not allow the costing of gaps. For
this reason, it can not be used directly for finding an optimum alignment between two
arbitrary proteins.
Search WWH ::




Custom Search