Digital Signal Processing Reference
In-Depth Information
arrays can be used that are still power-efficient. The disadvantage is that even in the
large arrays the amount of resources constrains which loops can be mapped.
Dynamically reconfigurable CGRAs can overcome this problem by spreading
the computations of a loop iteration over multiple configurations. Thus a small
dynamically reconfigurable array can execute larger loops. The loop size is then not
limited by the array size, but by the array size times the depth of the reconfiguration
memories. For reasons of power efficiency, this depth is also limited, typically to
tens or hundreds of configurations, which suffices for most if not all inner loops.
A potential disadvantage of dynamically reconfigurable CGRAs is the power
consumption of the configuration memories, even for small arrays, and of the
configuration fetching mechanism. The disadvantage can be tackled in different
ways. ADRES and MorphoSys tackle it by not allowing control flow in the loop
bodies, thus enabling the use of very simple, power-efficient configuration fetching
techniques similar to level-0 loop buffering [ 43 ] . Whenever control flow is found in
loop bodies, such as for conditional statements, this control flow then first needs to
be converted into data flow, for example by means of predication and hyperblock
formation [ 45 ] . While these techniques can introduce some initial overhead in the
code, this overhead typically will be more than compensated by the fact that a more
efficient CGRA design can be used.
The MorphoSys design takes this reduction of the reconfiguration fetching logic
even further by limiting the supported code to Single Instruction Multiple Data
(SIMD) code. In the two supported SIMD modes, all ISs in a row or all ISs in a
column perform identical operations. As such only one IS configuration needs to
be fetched per row or column. As already mentioned, the RaPiD architecture limits
the number of configuration bits to be fetched by making only a small part of the
configuration dynamically reconfigurable. Kim et al. provide yet another solution
in which the configuration bits of one column in one cycle are reused for the next
column in the next cycle [ 37 ] . Furthermore, they also propose to reduce the power
consumption in the configuration memories by compressing the configurations [ 38 ] .
Still, dynamically reconfigurable designs exist that put no restrictions on the code
to be executed, and that even allow control flow in the inner loops. The Silicon Hive
design is one such design. Unfortunately, no numbers on the power consumption
overhead of this design choice are publicly available.
A general rule is that a limited reconfigurability puts more constraints on the
types and sizes of loops that can be mapped. Which design provides the highest
performance or the highest energy efficiency depends, amongst others, on the
variation in loop complexity and loop size present in the applications to be mapped
onto the CGRA. With large statically reconfigurable CGRAs, it is only possible to
achieve high utilization for all loops in an application if all those loops have similar
complexity and size, or if they can be made so with loop transformations, and if the
iterations are not dependent on each other through long-latency dependency cycles
(as was the case in Fig. 5 ) . Dynamically reconfigurable CGRAs, by contrast, can
also achieve high average utilization over loops of varying sizes and complexities,
and with inter-iteration dependencies. That way dynamically reconfigurable CGRAs
can achieve higher energy efficiency in the data path, at the expense of higher energy
Search WWH ::




Custom Search