Computational Protein Design for Synthetic Biology - Synthetic Biology

Biology Reference

In-Depth Information

The problem of computational protein design was first presented in 1983 as the so-called

inverse protein-folding problem. 9 Whereas the task in protein folding and structure

prediction is to derive a protein

s tertiary structure given the primary amino acid sequence,

the objective of the inverse folding problem is, given a certain backbone tertiary structure, to

find a sequence that will fold into this template. The inverse folding problem is an

extension of the threading problem in homology modeling. In both cases, a side-chain

placement algorithm is tasked with finding the optimal combination of side-chain

conformations on the template backbone. In threading, only one amino acid, namely that

from the wild-type sequence of the threaded protein, is allowed at each residue position,

whereas in design, all amino acids can be considered at each residue position. For example,

for the full sequence design of a relatively small 100 residue long backbone, the side-chain

placement algorithm is tasked with finding the optimal sequence out of the astronomically

large number of 20 100

10 130 possible sequences.

Virtually all side-chain placement algorithms approach this problem by first discretizing

side-chain conformational space into a library of so-called rotamers for each amino acid

type, where each rotamer represents a frequently observed conformation for that amino

acid. This approach can be justified by the observation that amino acid side-chains prefer a

limited number of low-energy conformations in high-resolution crystal structures of natural

proteins. The library of rotamers for each amino acid type can thus be derived from

statistical analysis of protein crystal structures, 10 and as a simple rule of thumb, the rotamer

library for a side-chain with n chi angles will contain 3 n rotamers (with 1 rotamer in

staggered and 2 rotamers in gauche conformation for each chi angle). For example, in the

rotamer library for valine, a residue with 1 chi angle contains 3 rotamers, whereas the

rotamer library for lysine (4 chi angles) would contain 81 rotamers. The combined rotamer

library for all 20 canonical amino acids contains 367 rotamers by this rule of thumb.

104

Using the rotamer concept, the side-chain placement algorithm

s task of finding the optimal

sequence for a given backbone can be formulated more specifically, namely as the task of

finding the set of rotamers that give the lowest energy (as determined by the energy

function used) when placed on the template backbone. This lowest energy conformation is

often referred to as the GMEC (global minimum energy conformation). Thus, for the

aforementioned 100 residue case, with 367 rotamers being allowed at every position, the

side-chain placement algorithm needs to select a combination of rotamers out of

367 100

10 256 total possibilities. Even for a smaller problem, such as the redesign of a 20

residue binding site, there are 367 20

10 51 possible combinations. The simplest imaginable

side-chain placement algorithm would be a brute-force approach that simply enumerates all

possible rotamer combinations, scores each of them with the energy function, and

remembers the GMEC. However, such an enumerative algorithm is evidently impractical

considering the large number of possible solutions for even small design problems.

Assuming that assembling and scoring one conformation takes a millisecond on modern

computer hardware, an enumerative algorithm would take 10 48 seconds to find the GMEC

for the above-presented hypothetical 20 residue binding site design problem, which is

roughly 31 orders of magnitude longer than the estimated age of the universe. The most

important aspect of a viable side-chain placement algorithm is thus its ability to reduce the

combinatorial complexity of the problem and select a low-energy rotamer combination

within a short amount of time. An in-detail comparison of several algorithms developed for

this purpose was done by Mayo et al. 11 Today, the most commonly employed ones are

Monte Carlo algorithms, 12 and the so-called FASTER algorithm. 13

COMPUTATIONAL DESIGN OF PROTEINPROTEIN INTERACTIONS

Protein

protein interactions are involved in a large number of cellular processes from

signal transduction to differentiation to apoptosis and others. Being able to create new or

Synthetic Biology

Search WWH ::

Custom Search

Home