Biomedical Engineering Reference
In-Depth Information
sequences from the input set. The similarity matrix is used to construct a tree that
will guide the multiple sequence alignment. Closely related sequence pairs are aligned
first resulting in partial alignments. These partial alignments are then either combined
with other neighboring partial alignments or sequences in the guiding tree. The com-
putational complexity of ClustalW is reduced from being exponential when dynamic
programming based multiple alignment is used to a second-order polynomial.
For n sequences, the number of comparisons to be made are n(n
1 )/ 2, which
is very large as the number of sequences increases. This pairwise comparison can
be done in parallel. There are different approaches such as ClustalW-MPI, SGI's HT
ClustalW, and MULTICLUSTAL. These approaches increase the speed of aligning
multiple sequences. ClustalW-MPI and HTClustalW will be discussed in detail in the
following sections.
The demand for computational power in the bioinformatic field will continue to
grow as the complexity and the volume of data increases. This computational power
can only be delivered by large-scale parallel computers that either have distributed
memory architecture or shared memory architecture. Distributed computing has been
already used successfully in sequence alignment. In general, most of these applications
have been implemented using the work pool approach with coarse grain parallelism.
This type of implementation is ideal for clusters built with off-the-shelve personal
computers. Sequence alignment is expected to continue to draw increasing attention
and it will drive several high performance computing efforts.
10.3 SMITH-WATERMAN ALGORITHM
When looking for similarities between subsequences of two sequences, as is usu-
ally the goal in the methods used to find homologies by database searches, a local
alignment method is more appropriate than a global. The simple dynamic program-
ming algorithm described by Smith and Waterman [6] is the basis for this type of
alignments. The Smith-Waterman algorithm is perhaps the most widely used local
similarity algorithm for biological sequence database searching.
In Smith-Waterman database searches, the dynamic programming method is used
to compare every database sequence against the query sequence and assign a score
to each result. The dynamic programming method checks every possible alignment
between two given sequences. This algorithm can be used both to compute the optimal
alignment score and to create the actual alignment. It uses memory space proportional
to the product of the lengths of the two sequences, mn , and computing time propor-
tional to mn(m
+
n) . The recursion relations used in the original Smith-Waterman
algorithm are the following:
H i , j =
max
{
H i 1, j 1 , S
[
ai , bj
]
, E i , j , F i , j }
where
E i , j =
0 <k<i {
max
H i k , j
g(k)
}
, F i , j =
0 <l<j {
max
H i , j l
g(l)
}
Search WWH ::




Custom Search