Digital Signal Processing Reference
In-Depth Information
supporting control flow, the instruction memory organization can be simplified.
In statically reconfigurable CGRAs, this memory is nothing more than a set of
configuration bits that remain fixed for the whole execution of a loop. Clearly this
is very energy-efficient. Other CGRAs, called dynamically reconfigurable CGRAs,
feature a form of distributed level-0 loop buffers [ 43 ] or other small controllers
that fetch new configurations every cycle from simple configuration buffers. To
support loops that include control flow and conditional operations, the compiler
then replaces that control flow by data flow by means of predication [ 45 ] orother
mechanisms. In this way CGRAs differ from VLIW processors that typically feature
a power-hungry combination of an instruction cache, instruction decompression
and decoding pipeline stages and a non-trivial update mechanism of the program
counter.
There are two main drawbacks to CGRA architectures. Firstly, because they can
only execute loops, they need to be coupled to other cores on which all other parts
of the program are executed. In some designs, this coupling introduces run-time
and design-time overhead. Secondly, as clearly visible in the example CGRA of
Fig. 2 , the interconnect structure of a CGRA is vastly more complex than that of
a VLIW. On a VLIW, scheduling an instruction in some IS automatically implies
the reservation of connections between the RF and the IS and of the corresponding
ports. On CGRAs, this is not the case. Because there is no one-to-one mapping
between connections and input/output ports of ISs and RFs, connections need to
be reserved explicitly by the compiler or programmer together with ISs, and the
data flow needs to be routed explicitly over the available connections. This can
be done, for example, by programming switches and multiplexors (a.k.a. muxes)
explicitly, like the ones depicted in Fig. 2 b . Consequently more complex compiler
technology than that of VLIW compilers is needed to automate the mapping of code
onto a CGRA. Moreover, writing assembly code for CGRAs ranges from being very
difficult to virtually impossible, depending on the type of reconfigurability and on
the form of processor control.
Having explained these fundamental concepts that differentiate CGRAs from
VLIWs, we can now also differentiate them from Field-Programmable Gate Arrays
(FPGAs), where the name CGRA actually comes from. Whereas FPGAs feature
bitwise logic in the form of Look-Up Tables (LUTs) and switches, CGRAs feature
more energy-efficient and area-conscious word-wide ISs, RFs and interconnections.
Hence the name coarse-grained array architecture. As there are much fewer ISs on
a CGRA than there are LUTs on an FPGA, the number of bits required to configure
the CGRA ISs, muxes, and RF ports is typically orders of magnitude smaller than
on FPGAs. If this number becomes small enough, dynamic reconfiguration can be
possible every cycle. So in short, CGRAs can be seen as statically or dynamically
reconfigurable coarse-grained FPGAs, or as 2D, highly-clustered loop-only VLIWs
with direct interconnections between ISs that need to be programmed explicitly.
Search WWH ::




Custom Search