Biomedical Engineering Reference
In-Depth Information
of the substitution table corresponding to the respective character is loaded into each
PE as well as the constants α and β . B is then completely shifted through the array in
M
+
1 steps as displayed in Figure 7.8.
In iteration step k ,1
K
k
M
+
K
1, the values M(i , j) , D(i , j) , and I(i , j) for
all i , j with 1
i
M ,1
j
K , and k
=
i
+
j
1 are computed in parallel in the
PEs (m , n) with m
=
N
i div N
1 and n
=
N
i mod N
1. For this calcula-
tion, PE (m , n) receives the values M(i
1, j) , I(i
1, j) , and b j from its eastern
neighbor (m , n
+
1 ) if n < N
1 or from PE (m
+
1, 0 ) if n
=
N
1 and m < N
1,
whereas the values M(i
1, j
1 ) , M(i , j
1 ) , D(i , j
1 ) , a i , α , β , and sbt (a i , b j )
are stored locally. The lower right PE (N
1, N
1 ) receives b j in steps j with
0
1 from the lower western IP and zeros otherwise.
Because of the efficient row ringshift and row broadcast, these routing operations
can be accomplished in constant time on the ISA. Thus, it takes M
j
K
1 steps to
compute the alignment cost of the two sequences with the SW algorithm. However,
notice that after the last character of B enters the array, the first character of a new sub-
ject sequence can be input for the next iteration step. Thus, all subject sequences of the
database can be pipelined with only one step delay between two different sequences.
Assuming k sequences of length K and K
+
K
O(M) , we compute K sequence align-
ments in time O(KM) using O(M) processors. As the best sequential algorithm takes
O(KM 2 ) steps, our parallel implementation achieves maximal efficiency. The same
technique can be applied for mapping the Viterbi algorithm onto the ISA.
Because of the limited PE memory, only the highest score of matrix M is computed
on Systola 1024 for each pairwise comparison. Ranking the compared sequences and
reconstructing the alignments are carried out by the front-end PC. Because this last
operation is only performed for very few subject sequences, its computation time is
negligible. In our ISA algorithm, the maximum computation of the matrix M can
be easily incorporated with only a constant time penalty: After each iteration step,
all PEs compute a new value max by taking the maximum of the newly computed
H -value and the old value of max from its neighboring PE. After the last character
of a subject sequence has been processed in PE ( 0, 0 ) , the maximum of matrix M is
stored in PE ( 0, 0 ) , which is written into the adjacent western IP.
So far, we have assumed a processor array equal in size of the query sequence
length (M
=
N 2 ) . In practice, this rarely happens. Assuming a query sequence length
=
of M
=
kN with k a multiple of N or N a multiple of k , the algorithm is modified as
follows:
1. k
N : In this case, we can just replicate the algorithm for a k
×
N ISA on an
N subarray computes the alignment of the same
query sequence with different subject sequences.
2. k > N : A possible solution is to assign k/N characters of the sequences to each
PE instead of one. However, in this case, the memory size has to be sufficient
to store k/N rows of the substitution table (20 values per row, as there are 20
different amino acids), that is, on Systola 1024, it is only possible to assign
maximally two characters per PE for the SW algorithm (only one character
N
×
N ISA, that is, each k
×
Search WWH ::




Custom Search