Biomedical Engineering Reference
In-Depth Information
of the most important and mostly studied problems of computational biology [ 26 ].
Despite many efforts, an acceptable solution for new sequences, not having homol-
ogous sequences for which the 3D structure is known, is still to be found. Given the
difficulty to compute directly the protein 3D structure, many intermediate problems
have been addressed. One way to simplify the problem is to compute features that
are local with respect to the backbone of the protein. These are called secondary
structure motifs and are well characterised as alpha-helices, beta-sheets and coil on
the basis of specific values of torsion angles. The problem of predicting secondary
structures in proteins has been also addressed with machine learning methods and
it is presently considered one of the most successful problems of computational
biology [ 43 ]. In this chapter, we will comment on the most successful implementa-
tions of protein secondary structure prediction methods.
However, even when well predicted, secondary structure alone does not carry
enough information to understand protein 3D conformation. To this aim, it would
suffice to find global distance constraints between each couple of residues. This
sub-problem is commonly known as residue-residue contact prediction and it has
been again addressed with machine learning methods [ 3 ]. Residue-residue contact
prediction is today the only method that can grasp in a simplified manner long-range
interactions between residues of a protein sequence. Although the problem is still
far from being solved, we will review the most efficient algorithms that are presently
the state-of-the-art methods in the field.
So far, the most interesting results in secondary structure prediction and residue-
residue contact prediction have been achieved by a clever combination of machine-
learning methods with evolutionary information available in the ever growing
databases of protein structures [ 1 , 11 , 18 , 20 ].
In order to make the chapter self-contained as much as possible, in the follow-
ing sections we briefly review the most basic concepts of machine learning methods
(Sect. 2.2 ) and the most commonly used techniques for extracting evolutionary in-
formation from databases of protein sequences (Sect. 2.3 ). The rest of the chapter is
devoted to the detailed description of the most famous secondary structure predic-
tors (Sect. 2.4 ) and residue-residue contact predictors (Sect. 2.5 ). For both topics,
we also describe in detail the standard evaluation criteria adopted to measure the
performance of the predictors and outline what is the state of the art in terms of
the respective evaluation criteria according to the experiments performed at CASP
meetings 1 and EVA server. 2
2.2
Data Classification with Machine Learning Methods
Machine learning is concerned with the design and development of algorithms for
the acquisition and integration of knowledge. Biological data classification is a typ-
ical problem usually approached with machine learning methods.
1 http://predictioncenter.org/
2 http://cubic.bioc.columbia.edu/eva/
Search WWH ::




Custom Search