High-Performance Alignment Methods for Protein Threading - Parallel Computing for Bioinformatics and Computational Biology

Biomedical Engineering Reference

In-Depth Information

that there is no arbitrary close polynomial approximation algorithm, unless P

NP.

These complexity results have guided the research of threading methods to three

different directions.

The first approach consists of simplifying the problem by ignoring the pairwise

amino acid interactions. In this case the optimal alignment can be found by polynomial

dynamic programming algorithms [17, 18]. The methods based on this approach

ignore potentially rich source of structural information and, consequently, cannot

recognize distant structural homologies.

The second direction is to use approximate algorithms which are relatively fast

and capable of finding a good but not necessarily the optimal alignment. These meth-

ods include several algorithms based on dynamic programming [19-21], statistical

sampling [22], and genetic algorithms [23-25]. As these methods do not guarantee

optimality, their use risks to worsen the fold recognition sensitivity and quality.

The third way to attack PTP is to develop dedicated exact methods which are

exponential in the worst case, but efficient on most of the real-life instances. This

chapter traces the later direction of research and presents recently developed high-

performance exact algorithms for solving PTP, which use advanced mathematical

programming techniques, as well as parallel and distributed computing methods.

Subsequently, we summarize what we feel to be the most important steps in this

direction.

Lathrop and Smith [26] designed the first practical branch-and-bound (B&B)

algorithm for PTP. This algorithm became the kernel of several structure prediction

software packages [14, 27]. Lathrop and Smith's work has shown that the problem is

easier in practice than in theory and that it is possible to solve real-life (biological)

instances in a reasonable amount of time. However, practical applications of this B&B

algorithm remained limited to moderate-size instances. These results have drawn the

attention of many researchers to the problem, the authors included.

The second main step involves using mathematical programming techniques to

solve PTP. Two teams have been working independently in this direction and their

results were published almost simultaneously. Andonov et al. [28-30] proposed dif-

ferent mixed integer programming (MIP) models for PTP. The content of this chapter

is based essentially on the results from [30]. Xu et al. [31, 32] also reported successful

use of an MIP model in their protein threading package RAPTOR. The main draw-

back of mathematical programming approaches is that the corresponding models are

often very large (over 10 6 variables). Even the most advanced MIP solvers cannot

solve instances of such size in reasonable time and, in some cases, even to stock them

in computer memory. Different divide-and-conquer methods and parallel algorithms

are used to overcome this drawback.

Xu et al. [33, 34] were the first to use divide-and-conquer in their package

PROSPECT-I. Their algorithm performs well on simple template interaction topolo-

gies, but is inefficient for protein templates with dense pairwise amino acid interac-

tions. The ideas from PROSPECT-I were used in the latest version of RAPTOR [35].

Andonov et al. [29, 30] proposed a different divide-and-conquer strategy. While the

splits in Ref. [33] occur along the interactions between template blocks, the splits in

Refs. [29, 30] are along the possible positions of a given template block. In addition,

=

Parallel Computing for Bioinformatics and Computational Biology

Search WWH ::

Custom Search

Home