Biomedical Engineering Reference
In-Depth Information
7.5.2 Finding the Optimal Context
In our approach, we rely on finding several matches or several sub-sequences that are
similar to a given pattern (combined with a certain tolerance that defines the maximal
deviation from the given pattern). There is a trade-off between the length of the pattern
and the tolerance expected number of matches of this pattern in the database. The
longer the pattern, the less likely it is to find a match in the database. The larger the
tolerance, the easier it is to find a match. Both dependencies are exponential, that
is, if the sequence is a sequence over an alphabet with k letters and the pattern length
is n , then there are k l different patterns of length l , and if the database contains
M patterns, the expected number of sub-sequences that match the given pattern is
E
M/k l . To do any statistical analysis on the matches found, we need E to be of
sufficient size. To increase E forafixed l , we can reduce the size of the alphabet, that
is, instead of requesting a match with the given amino acids (there are 20 different
amino acids) we might only request the corresponding amino acid is in the same
class of amino acids (there we might place every amino acid into only 2 classes, such
as hydrophilic and hydrophobic). The following two cases demonstrate the potential
advantage of such an approach.
=
Case 1 : We look for all amino acid sub-sequences in the PDB database that match a
given sequence of five amino acids. Given the fact that the PDB database currently
contains about 10 7 amino acids and the fact that there are 3.2
10 6 different amino
acid sequences of length 5, we can expect to find about three matches.
Case 2 : We look for all amino acid sub-sequences in the PDB database that match
a given pattern that has a given amino acid in its central position and specifies to
its left an right a pattern of hydrophobicity of length 3 (i.e., we specify in which
position we expect to have a hydrophobic amino acid and in which position we
expect to have a hydrophilic amino acid). There are 1280 such patterns of length
7 and thus we can expect to find about 10 4 such patterns in the PDB database — a
much better basis for statistical analysis than in Case 1.
×
7.5.3 Develop Interactive Visualization Tools that Allow Fast
Correction of Incorrect or Improbable Predictions
There are several open source visualization tools available, which allow visualization
and rotation of proteins in a wide variety of modes. For our application, we would like
to develop interactive visualization tools that present the histogram of an angle as soon
as the curser is pointing at it. Then, it should be possible to point at the histogram and
change the corresponding dihedral angles (arrows in Figure 7.12). The visualization
tool should also have the option of highlighting amino acids of particular interest and
positions that might be involved in hydrogen bonds.
7.5.4 Extracting Structure Similarity Based on Dihedral Angles
Having the structures almost totally represented by sequences of dihedral angles,
opens the option of looking for similar structures, that is, similar sequences of dihedral
Search WWH ::




Custom Search