Biology Reference
In-Depth Information
The problem of computational protein design was first presented in 1983 as the so-called
inverse protein-folding problem. 9 Whereas the task in protein folding and structure
prediction is to derive a protein
s tertiary structure given the primary amino acid sequence,
the objective of the inverse folding problem is, given a certain backbone tertiary structure, to
find a sequence that will fold into this template. The inverse folding problem is an
extension of the threading problem in homology modeling. In both cases, a side-chain
placement algorithm is tasked with finding the optimal combination of side-chain
conformations on the template backbone. In threading, only one amino acid, namely that
from the wild-type sequence of the threaded protein, is allowed at each residue position,
whereas in design, all amino acids can be considered at each residue position. For example,
for the full sequence design of a relatively small 100 residue long backbone, the side-chain
placement algorithm is tasked with finding the optimal sequence out of the astronomically
large number of 20 100
'
10 130 possible sequences.
Virtually all side-chain placement algorithms approach this problem by first discretizing
side-chain conformational space into a library of so-called rotamers for each amino acid
type, where each rotamer represents a frequently observed conformation for that amino
acid. This approach can be justified by the observation that amino acid side-chains prefer a
limited number of low-energy conformations in high-resolution crystal structures of natural
proteins. The library of rotamers for each amino acid type can thus be derived from
statistical analysis of protein crystal structures, 10 and as a simple rule of thumb, the rotamer
library for a side-chain with n chi angles will contain 3 n rotamers (with 1 rotamer in
staggered and 2 rotamers in gauche conformation for each chi angle). For example, in the
rotamer library for valine, a residue with 1 chi angle contains 3 rotamers, whereas the
rotamer library for lysine (4 chi angles) would contain 81 rotamers. The combined rotamer
library for all 20 canonical amino acids contains 367 rotamers by this rule of thumb.
104
Using the rotamer concept, the side-chain placement algorithm
s task of finding the optimal
sequence for a given backbone can be formulated more specifically, namely as the task of
finding the set of rotamers that give the lowest energy (as determined by the energy
function used) when placed on the template backbone. This lowest energy conformation is
often referred to as the GMEC (global minimum energy conformation). Thus, for the
aforementioned 100 residue case, with 367 rotamers being allowed at every position, the
side-chain placement algorithm needs to select a combination of rotamers out of
367 100
'
10 256 total possibilities. Even for a smaller problem, such as the redesign of a 20
residue binding site, there are 367 20
10 51 possible combinations. The simplest imaginable
side-chain placement algorithm would be a brute-force approach that simply enumerates all
possible rotamer combinations, scores each of them with the energy function, and
remembers the GMEC. However, such an enumerative algorithm is evidently impractical
considering the large number of possible solutions for even small design problems.
Assuming that assembling and scoring one conformation takes a millisecond on modern
computer hardware, an enumerative algorithm would take 10 48 seconds to find the GMEC
for the above-presented hypothetical 20 residue binding site design problem, which is
roughly 31 orders of magnitude longer than the estimated age of the universe. The most
important aspect of a viable side-chain placement algorithm is thus its ability to reduce the
combinatorial complexity of the problem and select a low-energy rotamer combination
within a short amount of time. An in-detail comparison of several algorithms developed for
this purpose was done by Mayo et al. 11 Today, the most commonly employed ones are
Monte Carlo algorithms, 12 and the so-called FASTER algorithm. 13
COMPUTATIONAL DESIGN OF PROTEINPROTEIN INTERACTIONS
Protein
protein interactions are involved in a large number of cellular processes from
signal transduction to differentiation to apoptosis and others. Being able to create new or
Search WWH ::




Custom Search