Special-Purpose Computing for Biological Sequence Analysis - Parallel Computing for Bioinformatics and Computational Biology

Biomedical Engineering Reference

In-Depth Information

of the substitution table corresponding to the respective character is loaded into each

PE as well as the constants α and β . B is then completely shifted through the array in

−

1 steps as displayed in Figure 7.8.

In iteration step k ,1

≤

−

1, the values M(i , j) , D(i , j) , and I(i , j) for

all i , j with 1

≤

M ,1

≤

K , and k

−

1 are computed in parallel in the

PEs (m , n) with m

−

i div N

−

1 and n

−

i mod N

−

1. For this calcula-

tion, PE (m , n) receives the values M(i

−

1, j) , I(i

−

1, j) , and b j from its eastern

neighbor (m , n

1 ) if n < N

−

1 or from PE (m

1, 0 ) if n

−

1 and m < N

−

whereas the values M(i

−

1, j

−

1 ) , M(i , j

−

1 ) , D(i , j

−

1 ) , a i , α , β , and sbt (a i , b j )

are stored locally. The lower right PE (N

−

1, N

−

1 ) receives b j in steps j with

1 from the lower western IP and zeros otherwise.

Because of the efficient row ringshift and row broadcast, these routing operations

can be accomplished in constant time on the ISA. Thus, it takes M

≤

−

1 steps to

compute the alignment cost of the two sequences with the SW algorithm. However,

notice that after the last character of B enters the array, the first character of a new sub-

ject sequence can be input for the next iteration step. Thus, all subject sequences of the

database can be pipelined with only one step delay between two different sequences.

Assuming k sequences of length K and K

−

O(M) , we compute K sequence align-

ments in time O(KM) using O(M) processors. As the best sequential algorithm takes

O(KM 2 ) steps, our parallel implementation achieves maximal efficiency. The same

technique can be applied for mapping the Viterbi algorithm onto the ISA.

Because of the limited PE memory, only the highest score of matrix M is computed

on Systola 1024 for each pairwise comparison. Ranking the compared sequences and

reconstructing the alignments are carried out by the front-end PC. Because this last

operation is only performed for very few subject sequences, its computation time is

negligible. In our ISA algorithm, the maximum computation of the matrix M can

be easily incorporated with only a constant time penalty: After each iteration step,

all PEs compute a new value max by taking the maximum of the newly computed

H -value and the old value of max from its neighboring PE. After the last character

of a subject sequence has been processed in PE ( 0, 0 ) , the maximum of matrix M is

stored in PE ( 0, 0 ) , which is written into the adjacent western IP.

So far, we have assumed a processor array equal in size of the query sequence

length (M

N 2 ) . In practice, this rarely happens. Assuming a query sequence length

of M

kN with k a multiple of N or N a multiple of k , the algorithm is modified as

follows:

1. k

≤

N : In this case, we can just replicate the algorithm for a k

N ISA on an

N subarray computes the alignment of the same

query sequence with different subject sequences.

2. k > N : A possible solution is to assign k/N characters of the sequences to each

PE instead of one. However, in this case, the memory size has to be sufficient

to store k/N rows of the substitution table (20 values per row, as there are 20

different amino acids), that is, on Systola 1024, it is only possible to assign

maximally two characters per PE for the SW algorithm (only one character

N ISA, that is, each k

Parallel Computing for Bioinformatics and Computational Biology

Search WWH ::

Custom Search

Home