Inverse Folding Problem (Molecular Biology)

This problem arose from attempts to design proteins de novo. Here one begins by selecting a certain known target protein structure, for example, the globin fold, and freely designs amino acid sequences (primary structures) that should adopt this conformation spontaneously. There will be a number of sequences compatible for a particular target because homologous proteins are known to adopt the same fold (see Homological Modeling). A solution becomes unique, however, if one searches for the sequence that would fold and would give the most stable structure. Such a sequence is likely to differ even from the native sequence of the target structure. In this procedure, one starts from a 3-D structure and works toward a sequence (Fig. 1), opposite to the direction of protein folding and protein structure prediction, where one starts with the sequence and tries to find the 3-D structure.

Figure 1. Inverse folding problem. Given a particular 3-D structure, search for any amino acid sequence that would preferably adopt the fold. Note that the thinking is reversed in direction compared with that of protein folding.

Inverse folding problem. Given a particular 3-D structure, search for any amino acid sequence that would preferably adopt the fold. Note that the thinking is reversed in direction compared with that of protein folding.


One of the obstacles in solving the inverse folding problem is dealing with the amino acid residue side chains. The protein backbone can be fixed just the same as in the target structure, but the side chains vary with the sequence. At each stage, the optimal packing of the atoms of the side chains within the protein interior needs to be determined. To avoid this annoying task, a simplified treatment of side chains has been introduced (1); each of the 20 side chains of the normal amino acids is represented by one point (the C b atom for all residues except glycine), and its bulkiness and other physicochemical properties are effectively included in the interaction potential used, which is called the "mean-force energy potential" or the "Sippl potential." The position of the Cb atom is fully determined by the dihedral angles of the backbone alone (see Ramachandran Plot), so it is not necessary to worry about the side-chain conformation. The Sippl potential is a function of the distance between the C b atoms of two side chains and of the amino acid type. The quality of the side-chain packing is given quantitatively by the sum of the interaction energies estimated by the Sippl potential between the side chains. The applicability of the Sippl potential has been established by the "Sippl test (2)." The native amino acid sequence a of one structure (A) is selected from a structural library of various known structures and mounted onto another structure (B) that is larger in size than A. Mounting sequence a onto structure B is possible with various alignments, shifting residues one-by-one without introducing any gaps. Each time, the total energy of the structure is estimated by using the Sippl potential. Upon completion, structure B is replaced by the next structure of the structural library, including structure A itself. If the native combination of sequence and structure gives the lowest energy among all the alternatives, it is concluded that the potential function is effective and useful. The Sippl test has been performed successfully with various Sippl-type potentials (3, 4), but this is not sufficient for the inverse folding problem. The Sippl test is a recognition test for sequence recognizing structure, just opposite to the inverse folding problem of structure recognizing sequence.

Another difficulty with the inverse folding problem is whether the energy values of different systems (proteins with different sequences) can be compared to each other (5, 6). Of course, it is almost meaningless to compare the energy of two different substances, such as ethane and ethanol. Apart from the absolute energy, however, the energy of protein stability is defined as the energy difference between the folded and unfolded protein states, which can be compared among different proteins. Ota et al. (7) introduced a simple approximation to estimate the energy of protein stabilization in combination with the Sippl-type potential energy and applied the method to analyzing the thermal stability of mutant proteins. Similar treatments could conceivably deal with the inverse folding problem and achieve the de novo design of proteins.

Next post:

Previous post: