High-Speed VLSI Architecture Based on Massively Parallel Processor Arrays for Real-Time Remote Sensing Applications - Applications of Digital Signal Processing

Image Processing Reference

In-Depth Information

perform more computations per iteration. Unrolling also reduces the overall number of

branches significantly and gives the processor more instructions between branches (i.e., it

increases the size of basic blocks).

Next, we examine the computation-related optimizations followed by the memory

optimizations. Typically, when we are working with nests of loops, we are working with

multidimensional arrays. Computing in multidimensional arrays can lead to non-unit-stride

memory access. Many of the optimizations can be perform on loop nests to improve the

memory access patterns. The second SW-level transformation consists in to transform the

matrix-vector single assignment algorithm in the locally recursive algorithm representation

without global data dependencies (i.e. in term of a recursive form). At this stage, nested-

loop optimizations are employed in order to avoid large routing resources that are

translated into the large amount of buffers in the final processor array architecture. The

variable being broadcasted in single assignment algorithms is removed by passing the

variable through each of the neighbour processing elements (PEs) in a DG representation.

Additionally, loop interchange techniques for rearranging a loop nest are also applied. For

performance, the loop interchange of inner and outer loops is performed to pull the

computations into the center loop, where the unrolling is implemented.

3.1.4 Architecture design onto MPPAs

Massively parallel co-processors are typically part of a heterogeneous hardware/software-

system. Each processor is a massive parallel system consisting of an array of PEs. In this

study, we propose the MPPA architecture for the selected reconstructive SP matrix-vector

operation. This architecture is first modelled in a processor Array (PA) and next, each

processor is implemented also with an array of PEs (i.e., in a highly-pipelined bit-level

representation). Thus, we achieved the pursued MPPAs architecture following the space-

time mapping procedures.

First, some fundamental proved propositions are given in order to clarify the mapping

procedure onto PAs.

Proposition 1. There are types of algorithms that are expressed in terms of regular and

localized DG. For example, basic algebraic matrix-form operations, discrete inertial

transforms like convolution, correlation techniques, digital filtering, etc. that also can be

represented in matrix formats (Moldovan & Fortes, 1986), (Kung, 1988).

Proposition 2. As the DEDR algorithms can be considered as properly ordered sequences

vector-matrix multiplication procedures, then, they can be performed in an efficient

computational fashion following the PA-oriented HW/SW co-design paradigm (Kung,

1988).

Following the presented above propositions , we are ready to derive the proper PA

architectures. (Moldovan & Fortes, 1986) proved the mapping theory for the transformation

T . The transformation

ˆ

N

N 

1

N

TG

':



G

maps the N -dimensional DG (

G

) onto the ( N -1)-

ˆ N G ), where N represents the dimension of the DG (see proofs in (Kung,

1988) and details in (CastilloAtoche et al., 2010b). Second, the desired linear transformation

matrix operator T can be segmented in two blocks as follows

1

dimensional PA (

 

   

Π

T

Σ

,

(24)

Applications of Digital Signal Processing

Search WWH ::

Custom Search

Home