Parallel Implementations of Local Sequence Alignment: Hardware and Software - Parallel Computing for Bioinformatics and Computational Biology

Biomedical Engineering Reference

In-Depth Information

sequences from the input set. The similarity matrix is used to construct a tree that

will guide the multiple sequence alignment. Closely related sequence pairs are aligned

first resulting in partial alignments. These partial alignments are then either combined

with other neighboring partial alignments or sequences in the guiding tree. The com-

putational complexity of ClustalW is reduced from being exponential when dynamic

programming based multiple alignment is used to a second-order polynomial.

For n sequences, the number of comparisons to be made are n(n

1 )/ 2, which

is very large as the number of sequences increases. This pairwise comparison can

be done in parallel. There are different approaches such as ClustalW-MPI, SGI's HT

ClustalW, and MULTICLUSTAL. These approaches increase the speed of aligning

multiple sequences. ClustalW-MPI and HTClustalW will be discussed in detail in the

following sections.

The demand for computational power in the bioinformatic field will continue to

grow as the complexity and the volume of data increases. This computational power

can only be delivered by large-scale parallel computers that either have distributed

memory architecture or shared memory architecture. Distributed computing has been

already used successfully in sequence alignment. In general, most of these applications

have been implemented using the work pool approach with coarse grain parallelism.

This type of implementation is ideal for clusters built with off-the-shelve personal

computers. Sequence alignment is expected to continue to draw increasing attention

and it will drive several high performance computing efforts.

−

10.3 SMITH-WATERMAN ALGORITHM

When looking for similarities between subsequences of two sequences, as is usu-

ally the goal in the methods used to find homologies by database searches, a local

alignment method is more appropriate than a global. The simple dynamic program-

ming algorithm described by Smith and Waterman [6] is the basis for this type of

alignments. The Smith-Waterman algorithm is perhaps the most widely used local

similarity algorithm for biological sequence database searching.

In Smith-Waterman database searches, the dynamic programming method is used

to compare every database sequence against the query sequence and assign a score

to each result. The dynamic programming method checks every possible alignment

between two given sequences. This algorithm can be used both to compute the optimal

alignment score and to create the actual alignment. It uses memory space proportional

to the product of the lengths of the two sequences, mn , and computing time propor-

tional to mn(m

+

n) . The recursion relations used in the original Smith-Waterman

algorithm are the following:

H i , j =

max

{

H i − 1, j − 1 , S

[

ai , bj

]

, E i , j , F i , j }

where

E i , j =

0 <k<i {

max

H i − k , j −

g(k)

}

, F i , j =

0 <l<j {

max

H i , j − l −

g(l)

}

Parallel Computing for Bioinformatics and Computational Biology

Search WWH ::

Custom Search

Home