Differential Evolution for Protein Structure Prediction Using the HP Model - Foundations on Natural and Artificial Computation

Information Technology Reference

In-Depth Information

In the case of the prediction of the final tertiary structure, the methods range

from comparison methods with resolved structures to the “ab initio” prediction.

In the first case, the search space is pruned by the assumption that the target

protein adopts a structure close to the experimentally determined structure of

another homologous protein. But the output of experimentally determined pro-

tein structures -by time-consuming and relatively expensive X-ray crystallogra-

phy or NMR spectroscopy- is lagging far behind the output of protein sequences.

Because of this, the most dicult ab initio prediction is a challenge in bioinfor-

matics. It uses only the information from the amino acid sequence of the primary

structure [21]. In such prediction there are models that simplify the complexity

of the interactions and the nature of the amino acid elements, like the models

that locate these in a lattice, or detailed atomic models like the Rosetta system

[20]. Nevertheless, as Zhao [24] indicates, such detailed atomic models would not

be able to explore more than small changes that occur over very small timescales

and they involve many parameters and approximations. For this reason, simpli-

fied or minimalist models are employed. The use of a reduced alphabet of amino

acids is based on the recognition that the binary pattern of hydrophobic and

polar residues is a major determinant of the folding of a protein.

In the HP model [6] the elements of the chain can be of two types: H (hy-

drophobic residues) and P (polar residues). The sequence is assumed to be em-

bedded in a lattice that discretizes the space conformation and can exhibit differ-

ent topologies such as 2D square or triangular lattices, or 3D cubic or diamond

lattices. The interaction between two H elements that are adjacent in the lattice

(and not consecutive in the primary sequence) is -1 and zero for the other pos-

sible pairs. That is, the HP energy matrix only implies attractions (H with H),

and neutral interactions (P with P and P with H). Given a primary sequence,

the problem is to search for the folding structure in the lattice that minimizes

the energy. The complexity of the problem has been shown to be NP-hard [10,23]

and the progress was slow; as Unger points out “minimal progress was achieved

in the category of ab initio folding” [22]. Although the HP model is simple, it is

powerful enough to capture many properties of actual proteins. It is non-trivial,

captures many global aspects of real proteins and still remains the hardness fea-

tures of the original biological problem [8]. For this reason, many authors have

been working on several evolutionary algorithms [22,24] in the direct prediction

of the native conformations using the HP model, as we detail in the next section.

Additionally, we must take into account that the energy landscape in this

problem presents a multitude of local energy minima separated by high barriers.

As Zhao indicates “there are many meta-stable states whose energies are very

close to the global minimum and an exceedingly small number of global optimal

states. Folding energy landscapes are funnel-like” [24]. For this reason, we will

test the capability of Differential Evolution as a method with a better control in

the balance between exploration and exploitation with respect to a classical ge-

netic algorithm, as detailed in Section 3. Moreover, we will introduce methods to

translate illegal protein conformations to feasible ones, smoothing the landscape

(Sect. 3.2). Finally, we will test our proposals with benchmark series (Sect. 4).

Foundations on Natural and Artificial Computation

Search WWH ::

Custom Search

Home