Divide and Conquer Strategies for Protein Structure Prediction - Mathematical Approaches to Polymer Sequence Analysis and Related Problems

Biomedical Engineering Reference

In-Depth Information

of the most important and mostly studied problems of computational biology [ 26 ].

Despite many efforts, an acceptable solution for new sequences, not having homol-

ogous sequences for which the 3D structure is known, is still to be found. Given the

difficulty to compute directly the protein 3D structure, many intermediate problems

have been addressed. One way to simplify the problem is to compute features that

are local with respect to the backbone of the protein. These are called secondary

structure motifs and are well characterised as alpha-helices, beta-sheets and coil on

the basis of specific values of torsion angles. The problem of predicting secondary

structures in proteins has been also addressed with machine learning methods and

it is presently considered one of the most successful problems of computational

biology [ 43 ]. In this chapter, we will comment on the most successful implementa-

tions of protein secondary structure prediction methods.

However, even when well predicted, secondary structure alone does not carry

enough information to understand protein 3D conformation. To this aim, it would

suffice to find global distance constraints between each couple of residues. This

sub-problem is commonly known as residue-residue contact prediction and it has

been again addressed with machine learning methods [ 3 ]. Residue-residue contact

prediction is today the only method that can grasp in a simplified manner long-range

interactions between residues of a protein sequence. Although the problem is still

far from being solved, we will review the most efficient algorithms that are presently

the state-of-the-art methods in the field.

So far, the most interesting results in secondary structure prediction and residue-

residue contact prediction have been achieved by a clever combination of machine-

learning methods with evolutionary information available in the ever growing

databases of protein structures [ 1 , 11 , 18 , 20 ].

In order to make the chapter self-contained as much as possible, in the follow-

ing sections we briefly review the most basic concepts of machine learning methods

(Sect. 2.2 ) and the most commonly used techniques for extracting evolutionary in-

formation from databases of protein sequences (Sect. 2.3 ). The rest of the chapter is

devoted to the detailed description of the most famous secondary structure predic-

tors (Sect. 2.4 ) and residue-residue contact predictors (Sect. 2.5 ). For both topics,

we also describe in detail the standard evaluation criteria adopted to measure the

performance of the predictors and outline what is the state of the art in terms of

the respective evaluation criteria according to the experiments performed at CASP

meetings 1 and EVA server. 2

2.2

Data Classification with Machine Learning Methods

Machine learning is concerned with the design and development of algorithms for

the acquisition and integration of knowledge. Biological data classification is a typ-

ical problem usually approached with machine learning methods.

1 http://predictioncenter.org/

2 http://cubic.bioc.columbia.edu/eva/

Search WWH ::

Custom Search

Home