Image Processing Reference
In-Depth Information
perform more computations per iteration. Unrolling also reduces the overall number of
branches significantly and gives the processor more instructions between branches (i.e., it
increases the size of basic blocks).
Next, we examine the computation-related optimizations followed by the memory
optimizations. Typically, when we are working with nests of loops, we are working with
multidimensional arrays. Computing in multidimensional arrays can lead to non-unit-stride
memory access. Many of the optimizations can be perform on loop nests to improve the
memory access patterns. The second SW-level transformation consists in to transform the
matrix-vector single assignment algorithm in the locally recursive algorithm representation
without global data dependencies (i.e. in term of a recursive form). At this stage, nested-
loop optimizations are employed in order to avoid large routing resources that are
translated into the large amount of buffers in the final processor array architecture. The
variable being broadcasted in single assignment algorithms is removed by passing the
variable through each of the neighbour processing elements (PEs) in a DG representation.
Additionally, loop interchange techniques for rearranging a loop nest are also applied. For
performance, the loop interchange of inner and outer loops is performed to pull the
computations into the center loop, where the unrolling is implemented.
3.1.4 Architecture design onto MPPAs
Massively parallel co-processors are typically part of a heterogeneous hardware/software-
system. Each processor is a massive parallel system consisting of an array of PEs. In this
study, we propose the MPPA architecture for the selected reconstructive SP matrix-vector
operation. This architecture is first modelled in a processor Array (PA) and next, each
processor is implemented also with an array of PEs (i.e., in a highly-pipelined bit-level
representation). Thus, we achieved the pursued MPPAs architecture following the space-
time mapping procedures.
First, some fundamental proved propositions are given in order to clarify the mapping
procedure onto PAs.
Proposition 1. There are types of algorithms that are expressed in terms of regular and
localized DG. For example, basic algebraic matrix-form operations, discrete inertial
transforms like convolution, correlation techniques, digital filtering, etc. that also can be
represented in matrix formats (Moldovan & Fortes, 1986), (Kung, 1988).
Proposition 2. As the DEDR algorithms can be considered as properly ordered sequences
vector-matrix multiplication procedures, then, they can be performed in an efficient
computational fashion following the PA-oriented HW/SW co-design paradigm (Kung,
1988).
Following the presented above propositions , we are ready to derive the proper PA
architectures. (Moldovan & Fortes, 1986) proved the mapping theory for the transformation
T . The transformation
ˆ
N
N
1
N
TG
':
G
maps the N -dimensional DG (
G
) onto the ( N -1)-
ˆ N G ), where N represents the dimension of the DG (see proofs in (Kung,
1988) and details in (CastilloAtoche et al., 2010b). Second, the desired linear transformation
matrix operator T can be segmented in two blocks as follows
1
dimensional PA (
 
 
Π
T
Σ
,
(24)
Search WWH ::




Custom Search