Coarse-Grained Reconfigurable Array Architectures - Signal Processing Systems

Digital Signal Processing Reference

In-Depth Information

x-axis

ins 0

ins 1

ins 2

ins 3

ins 2

Fig. 5 The left part shows a spatial mapping of a sequence of four instructions on a statically

reconfigurable 2

2 CGRA. Edges denote dependencies, with the edge from instruction 3 to

instruction 0 denoting that instruction 0 from iteration i depends on instruction 3 from iteration

i

×

1. So only one out of four ISs is utilized per cycle. The right part shows a temporal mapping of

the same code on a dynamically reconfigurable CGRA with only one IS. The utilization is higher

here, at 100 %

−

no reconfiguration takes place during the loop at all. Still other architectures feature

a hybrid reconfigurability. The RaPiD [ 18 , 23 ] architecture features partial dynamic

reconfigurability, in which part of the bits are statically reconfigurable and another

part is dynamically reconfigurable and controlled by a small sequencer. Yet another

example is the PACT architecture, in which the CGRA itself can initiate events that

invoke (partial) reconfiguration. This reconfiguration consumes a significant amount

of time, however, so it is advised to avoid it if possible, and to use the CGRA as a

statically reconfigurable CGRA.

In statically reconfigured CGRAs, each resource performs a single task for the

whole duration of the loop. In that case, the mapping of software onto hardware

becomes purely spatial, as illustrated in Fig. 5 a . In other words, the mapping

problem becomes one of placement and routing, in which instructions and data

dependencies between instructions have to mapped on a 2D array of resources. For

these CGRAs, compiler techniques similar to hardware synthesis techniques can be

used, as those used in FPGA placement and routing [ 7 ] .

By contrast, dynamic reconfigurability enables the programmer to use hardware

resources for multiple different tasks during the execution of a loop or even during

the execution of a single loop iteration. In that case, the software mapping problem

becomes a spatial and temporal mapping problem, in which the operations and data

transfers not only need to be placed and routed on and over the hardware resources,

but in which they also need to be scheduled. A contrived example of a temporal

mapping is depicted in Fig. 5 b . Most compiler techniques [ 20 , 22 , 25 , 48 , 52 , 54 , 55 ]

for these architectures also originate from the FPGA placement and routing world.

For CGRAs, the array of resources is not treated as a 2D spatial array, but as a

3D spatial-temporal array, in which the third dimension models time in the form

of execution cycles. Scheduling in this dimension is often based on techniques that

combine VLIW scheduling techniques such as modulo scheduling [ 39 , 61 ] , with

FPGA synthesis-based techniques [ 7 ] . Still other compiler techniques exist that are

based on constraint solving [ 67 ] , or on integer linear programming [ 2 , 41 , 77 ] .

The most important advantage of static reconfigurability is the lack of reconfigu-

ration overhead, in particular in terms of power consumption. For that reason, large

Signal Processing Systems

Search WWH ::

Custom Search

Home