High-Speed VLSI Architecture Based on Massively Parallel Processor Arrays for Real-Time Remote Sensing Applications - Applications of Digital Signal Processing - page 143

Image Processing Reference

In-Depth Information

u

j

F

1 kk



ˆ

u

b

RFS ( )

j

1 kk



Fig. 1. VLSI-FPGA platform of the RSF/RASF algorithms via the HW/SW co-design

paradigm.

The basic algebraic matrix operation (i.e., the selected matrix-vector multiplication) that

constitutes the base of the most computationally consuming applications in the

reconstructive SP applications is transformed into the required parallel algorithmic

representation format. A manifold of different approaches can be used to represent parallel

algorithms, e.g. (Moldovan & Fortes, 1986), (Kung, 1988). In this study, we consider a

number of different loop optimization techniques used in high performance computing

(HPC) in order to exploit the maximum possible parallelism in the design:

- Loop unrolling,

- Nested loop optimization,

- Loop interchange.

In addition, to achieve such maximum possible parallelism in an algorithm, the so-called

data dependencies in the computations must be analyzed (Moldovan & Fortes, 1986), (Kung,

1988). Formally, these dependencies are to be expressed via the corresponding dependence

graph (DG). Following (Kung, 1988), we define the dependence graph G =[ P , E ] as a

composite set where P represents the nodes and E represents the arcs or edges in which each

e  E connects

  . Next, the data dependencies

analysis of the matrix-vector multiplication algorithms should be performed aimed at their

efficient parallelization.

For example, the matrix-vector multiplication of an n × m matrix A with a vector x of

dimension

pp  P that is represented as

ep

p

12

1

2

m,

given

by

y = Ax ,

can

be

algorithmically

computed

as

n



, where y and

a

represents an n- dimensional ( n - D ) output

y



a x

,

for

j



1,...,

m

j

ji

i

ji

i



1

vector and the corresponding element of A , respectively. The first SW-level transformation

is the so-called single assignment algorithm (Kung, 1988), (Castillo Atoche et al., 2010b) that

performs the computing of the matrix-vector product. Such single assignment algorithm

corresponds to a loop unrolling method in which the primary benefit in loop unrolling is to

Next Page

Applications of Digital Signal Processing

Search WWH ::

Custom Search

Home