Special-Purpose Computing for Biological Sequence Analysis - Parallel Computing for Bioinformatics and Computational Biology

Biomedical Engineering Reference

In-Depth Information

TABLE 7.3 Scan Times (in seconds) of SwissProt (release

40, 113997 Protein Sequences) for Various HMM Lengths

with the Viterbi algorithm on Systola 1024 and a PC

Cluster with 16 Systolas

HMM length

112

222

490

Systola 1024

152

288

546

Speedup

3

3.5

4

PC cluster of 16 Systolas

12

22

40

Speedup

37

45

56

Note : The speedup compared to a Pentium III 1 GHz is also reported.

Systola 1024 is around two times faster than the much larger 1K-PE MasPar and the

cluster of 16 Systolas is around two times faster than a 16K-PE MasPar. The one-

board Kestrel is four to five times faster than a Systola board. Kestrel's design [8]

is also a programmable fine-grained parallel architecture implemented as a PC add-

on board. It reaches the higher performance, because it has been built with 0.5

m

µ

CMOS technology, in comparison with 1.0

m for Systola 1024. Extrapolating to

this technology both approaches should perform equally. However, the difference

between both architectures is that Kestrel is purely a linear array, whereas Systola is a

mesh. This makes the Systola 1024 a more flexible design, suitable for a wider range

of applications; see, for example, [27-32].

µ

7.4.1 Performance Evaluation of a Reconfigurable Platform

To evaluate our approach on a device based on state-of-the-art technology, we have

implemented the SW algorithm on a modern reconfigurable architecture. Figure 7.9

shows our PE design for computing one cell of the SW dynamic programming matrix.

A linear systolic array of this PE can be built to compute the SW algorithm as shown

in Figure 7.7.

As the query sequence is usually larger than the processor array, we have to do a

partitioning on a fixed-size linear processor array. For sake of clarity, we first assume

a query sequence of length M and a processor array of size N , where M is a multiple

of N , that is, M

1 is an integer. A possible solution is to split the

computation into k passes. Unfortunately, this solution requires a large amount of off-

chip memory. The memory requirement can be reduced by factor p by splitting the

database into p equal-sized pieces and computing the alignment scores of all subject

sequences within each piece.

However, this approach also increases the loading time of substitution table

columns by factor p . To eliminate this loading time, we have slightly extended our PE

design. Each PE now stores k columns of the substitution table instead of only one.

Although this increases the area of each PE, it allows for alignment of each database

sequence with the complete query sequence without additional delays. It also reduces

the required off-chip memory for storing intermediate results to four times the longest

=

kN where k

≥

Parallel Computing for Bioinformatics and Computational Biology

Search WWH ::

Custom Search

Home