Biomedical Engineering Reference
In-Depth Information
3 filter calculations can be performed
in parallel if the data input can provide four consecutive data words. Besides four
implementations of the 3
As shown in the block diagram up to four 3
×
×
3 filter calculation blocks, the register matrix and the
delay memory must be extended and reconnected to provide the appropriate pixel
data for each filter calculation block.
This extended parallel architecture has a four times faster execution time compared
to the solution with only one filter block. Mainly because each filter block uses its own
mathematical operations which run completely parallel to the others. Compared to
the first initial approach where the pixel data were processed sequentially the speedup
is 36.
28.4.3
Image Filter Example Conclusions
Starting from the question if the innermost loop of the FIR 3
3 image filter task can
be executed in parallel this section has shown a possible solution in form of a parallel
architecture. The basic principles of a parallel execution and the optimization steps
have been shown in detail.
In principle, the innermost loop of the FIR image filter is an algorithm that has a
fine-grain granularity and therefore it is not suited to be accelerated by using clusters. It
is shown that especially these fine-grain granularity algorithms are very well suited for
an execution by a parallel working architecture, because the operations are connected
directly using dedicated buses and connections.
The described architecture for the 3
×
3 FIR filter can be implemented on an FPGA
processor board where a connection to the local memory is given. The parallel archi-
tecture is implemented within the FPGA processor and the available local memory
can be used for temporary storage of the image data. The integration of the parallel
architecture into the software is done by the API.
In summary it can be ascertained that the direct implementation of the architecture
and the flexibility of the FPGA processors make the FPGA coprocessor an ideal plat-
form for this type of fine-grain granularity algorithms. Furthermore, the modification
of coefficients or the increase of the filter size or the data words can be integrated
easily into the existing architecture and the modified algorithm can be executed within
the same FPGA processor.
×
28.5
CASE STUDY: PROTEIN STRUCTURE PREDICTION
The described FPGA processor is able to execute several kinds of applications. It
can be seen as a general-purpose computing processor. The general usability will be
presented in this section using a concrete application from the field of bioinformatics.
The complete application for protein structure prediction, a view of the parallelized
architecture, and the integration into an existing software environment is described
in this section. This protein structure prediction approach is shown in specific detail
rather than showing numerous application examples that are suitable for an execution
on an FPGA processor.
Search WWH ::




Custom Search