Coarse-Grained Reconfigurable Array Architectures - Signal Processing Systems

Digital Signal Processing Reference

In-Depth Information

supporting control flow, the instruction memory organization can be simplified.

In statically reconfigurable CGRAs, this memory is nothing more than a set of

configuration bits that remain fixed for the whole execution of a loop. Clearly this

is very energy-efficient. Other CGRAs, called dynamically reconfigurable CGRAs,

feature a form of distributed level-0 loop buffers [ 43 ] or other small controllers

that fetch new configurations every cycle from simple configuration buffers. To

support loops that include control flow and conditional operations, the compiler

then replaces that control flow by data flow by means of predication [ 45 ] orother

mechanisms. In this way CGRAs differ from VLIW processors that typically feature

a power-hungry combination of an instruction cache, instruction decompression

and decoding pipeline stages and a non-trivial update mechanism of the program

counter.

There are two main drawbacks to CGRA architectures. Firstly, because they can

only execute loops, they need to be coupled to other cores on which all other parts

of the program are executed. In some designs, this coupling introduces run-time

and design-time overhead. Secondly, as clearly visible in the example CGRA of

Fig. 2 , the interconnect structure of a CGRA is vastly more complex than that of

a VLIW. On a VLIW, scheduling an instruction in some IS automatically implies

the reservation of connections between the RF and the IS and of the corresponding

ports. On CGRAs, this is not the case. Because there is no one-to-one mapping

between connections and input/output ports of ISs and RFs, connections need to

be reserved explicitly by the compiler or programmer together with ISs, and the

data flow needs to be routed explicitly over the available connections. This can

be done, for example, by programming switches and multiplexors (a.k.a. muxes)

explicitly, like the ones depicted in Fig. 2 b . Consequently more complex compiler

technology than that of VLIW compilers is needed to automate the mapping of code

onto a CGRA. Moreover, writing assembly code for CGRAs ranges from being very

difficult to virtually impossible, depending on the type of reconfigurability and on

the form of processor control.

Having explained these fundamental concepts that differentiate CGRAs from

VLIWs, we can now also differentiate them from Field-Programmable Gate Arrays

(FPGAs), where the name CGRA actually comes from. Whereas FPGAs feature

bitwise logic in the form of Look-Up Tables (LUTs) and switches, CGRAs feature

more energy-efficient and area-conscious word-wide ISs, RFs and interconnections.

Hence the name coarse-grained array architecture. As there are much fewer ISs on

a CGRA than there are LUTs on an FPGA, the number of bits required to configure

the CGRA ISs, muxes, and RF ports is typically orders of magnitude smaller than

on FPGAs. If this number becomes small enough, dynamic reconfiguration can be

possible every cycle. So in short, CGRAs can be seen as statically or dynamically

reconfigurable coarse-grained FPGAs, or as 2D, highly-clustered loop-only VLIWs

with direct interconnections between ISs that need to be programmed explicitly.

Search WWH ::

Custom Search

Home