Information Technology Reference
In-Depth Information
than superposing a large number of C D atoms, so one can use algorithms that could not cope
with large atomic detail structures. In addition, SSEs incorporate added knowledge on
molecular geometry. The success of the process depends on i) how secondary structures are
assigned; ii) how the similarity between two secondary structural elements of two proteins
is estimated; iii) how the overall similarity between the two proteins is defined.
Although the SSEs (at least the most common like helices and strands) are clearly
defined, different assignment result from different assignment algorithms [44-46].
Consequently, different representations of the protein structures may arise. A further
problem is which SSE types are considered. Very often a two-states classification is used:
helix, including 3/10 and pi, and stand. There are nevertheless exceptions. Orengo et al.
[44-46], for example, adopt a three-states classification: alpha-helix, 3/10-helix, and strand.
The similarity between secondary structural elements in two proteins is usually
estimated by comparing each pair of SSEs of one protein with each pair of the other. The
3D arrangement of a two secondary structural elements in a protein is usually defined by
their distance, their plane angle, and their torsion. A similarity score can then be computed
for each pair of two secondary structural elements. The resulting matrix of similarity scores
can then be scrutinized with dynamic programming techniques [41,47-49], treated as a
maximum clique problem [50], with pseudo-distance matrices [51], or with cluster analysis
[52]. The alignment of the secondary structural elements is eventually followed by a
superposition of the C D atoms with an initial structural alignment that depends on the
secondary structure alignment. The overall similarity between the two structures can be
then estimated on the basis of the rmsd values [50] of with more sophisticated figures of
merit that considers also the quality of the secondary structure fit.
The fragment-pair approach is also amenable to probabilistic interpretation. The
VAST program of Bryant and coworkers [53,54] provides BLAST-like P significance
values. VAST's elementary unit of comparison is a simplified rmsd score resulting from a
superposition of the endpoints of SSE pairs “trimmed” to the same length. First rmsd values
are converted into log-odds scores using precomputed values of comparison of SSE pairs
from related and unrelated structures, then a combined score S o is calculated from the i best
SSE pairs found to mattch between the query and a database entry. The principle of
converting S o into a P value is similar to that used by BLAST, given in equations. 15-17,
but relies on tabulated statistics, rather then on analytical formulae. Let the probability of
finding a substructure of size i with a score S i t S o be denoted as P(S i t S o ). In VAST, the
value of P(S i t S o ) is estimated as a function of i and S i , using tabulated values resulting from
random comparisons. The expected number E of finding at least one score S i t S o by chance
will also depend on the size of the search space which can be defined as the total number of
possible common substructures of i SSEs between the two proteins, a number denoted by
N i . The equation computed by VAST is then
[8]
¦
E
N
P
(
S
t
S
)
i
i
o
i
The sum is calculated for all i values using the tabulated P(S i t S o values. Same as
with BLAST, if E is small (e.g. E<0.01) it is also a P value. The method is very fast, due to
the precomputed statistics, and accessible at the NCBI web site.
A variety of other procedures that represent the protein 3-D structure as an ensemble
of secondary structural elements have also been proposed. In Martin's approach [55],
secondary structural elements are given one of the letters of an alphabet that identify the
secondary structure type, direction, length, and solvent accessibility. Two proteins can be
thus compared with the simple Needlemann-Wunsch algorithm. Murthy [56] used dynamic
programming techniques to optimally superpose secondary structural elements.
Search WWH ::




Custom Search