Comparison of sequences, protein 3D structures and genomes - Essays in Bioinformatics

Information Technology Reference

In-Depth Information

hydrophobicity plots have been used to recognize amphipathic helices as well as to build

classifiers to various protein groups. A review on these applications is in [28].

2. Comparison of 3D structures

Comparison of 3D structure is used in a variety of fields such as fold recognition, structural

evolution studies and drug design, and the protocols are as diverse as the fields themselves.

E.g. in the comparison of 3D structures produced on the same protein molecule by NMR

methods, all the equivalent atom-pairs are a priori known and can be used in the

comparison. In contrast, determination of folds is based on the backbone CD atoms only

and the equivalences have to be determined by the calculation itself. In this section we will

briefly summarize the similarity/distance functions used for backbone comparison,

concentrating on the similarity/distance measures used rather than the goal and/or

implementation of the actual algorithms. In the majority of the cases, the approach used for

structural alignments is quite similar to that used in sequence analysis (finding alignment

paths in a distance matrix or optimizing the range by successive omission or additions).

This is because 3D structures can be compared in terms of their (overlapping) peptide

fragments, and a series of peptide fragments is a linear, sequence-like representation. For

example, one can compute an rmsd between the peptide fragments of two proteins and

construct a distance-matrix with the resulting values [29,30]. But there are many ways to

represent peptide fragments as vectors, and then one can use any of the vector-distance

formulas to produce the values of the distance matrix. For example, vectors of torsional

angles [31,32], curvature and torsion parameters of peptide fragments [33,34] have been

used by early comparison methods, as reviewed by Orengo [35]. More recent methods

include structural alphabets described in terms of dihedral angles [36,37] or on distance

geometry [38,39]. In the latter method, the size of the alphabet (the minimum number of

fragments necessary to describe the observed data) is 27 derived from statistical

optimisation. The similarity search is then carried out by Smith-Waterman alignment.

The similarity measures described in this section can be classified according to the

use of atomic (residue-based) descriptions, or higher-order descriptions such as secondary

structure elements. Another important difference is that some of the methods can be used to

produce structural alignments while others are only preliminary filters indicating similarity

without providing a structural alignment.

Methods based on superposition of atoms use the rmsd distance (section x, above)

Even though the results of atom superposition methods are generally considered superior to

most computational alternatives, and very low rmsd values are indicative of identical

structures - rmsd can be used only with caution as a quantitative indicator of similarity. In

addition, there is no accepted and reliable statistical model that would allow to use rmsd as

a probabilistic score with a statistical significance, moreover rmsd does not penalize gaps.

Therefore there a number of alternative similarity scores have been developed for obtaining

optimal structural alignments even though the final results are always characterized in

terms of the rmsd score.

One group of similarity scores is based on vectors or sets of vectors assigned to

each position within a protein structure. The parameters of the vector represent various

features. Methods developed by Taylor and Orengo [40,41] assigned a set of intramolecular

C D C D vectors to each residue position, or used various geometric features as parameters of

the vector assigned to each residue position. As a result, a protein structure was converted

into a series of residue vectors, and two structures could be compared to give a so-called

residue matrix in which the elements are calculated as a vectorial difference (city-block

Essays in Bioinformatics

Search WWH ::

Custom Search

Home