Coarse-Grained Reconfigurable Array Architectures - Signal Processing Systems

Digital Signal Processing Reference

In-Depth Information

arrays can be used that are still power-efficient. The disadvantage is that even in the

large arrays the amount of resources constrains which loops can be mapped.

Dynamically reconfigurable CGRAs can overcome this problem by spreading

the computations of a loop iteration over multiple configurations. Thus a small

dynamically reconfigurable array can execute larger loops. The loop size is then not

limited by the array size, but by the array size times the depth of the reconfiguration

memories. For reasons of power efficiency, this depth is also limited, typically to

tens or hundreds of configurations, which suffices for most if not all inner loops.

A potential disadvantage of dynamically reconfigurable CGRAs is the power

consumption of the configuration memories, even for small arrays, and of the

configuration fetching mechanism. The disadvantage can be tackled in different

ways. ADRES and MorphoSys tackle it by not allowing control flow in the loop

bodies, thus enabling the use of very simple, power-efficient configuration fetching

techniques similar to level-0 loop buffering [ 43 ] . Whenever control flow is found in

loop bodies, such as for conditional statements, this control flow then first needs to

be converted into data flow, for example by means of predication and hyperblock

formation [ 45 ] . While these techniques can introduce some initial overhead in the

code, this overhead typically will be more than compensated by the fact that a more

efficient CGRA design can be used.

The MorphoSys design takes this reduction of the reconfiguration fetching logic

even further by limiting the supported code to Single Instruction Multiple Data

(SIMD) code. In the two supported SIMD modes, all ISs in a row or all ISs in a

column perform identical operations. As such only one IS configuration needs to

be fetched per row or column. As already mentioned, the RaPiD architecture limits

the number of configuration bits to be fetched by making only a small part of the

configuration dynamically reconfigurable. Kim et al. provide yet another solution

in which the configuration bits of one column in one cycle are reused for the next

column in the next cycle [ 37 ] . Furthermore, they also propose to reduce the power

consumption in the configuration memories by compressing the configurations [ 38 ] .

Still, dynamically reconfigurable designs exist that put no restrictions on the code

to be executed, and that even allow control flow in the inner loops. The Silicon Hive

design is one such design. Unfortunately, no numbers on the power consumption

overhead of this design choice are publicly available.

A general rule is that a limited reconfigurability puts more constraints on the

types and sizes of loops that can be mapped. Which design provides the highest

performance or the highest energy efficiency depends, amongst others, on the

variation in loop complexity and loop size present in the applications to be mapped

onto the CGRA. With large statically reconfigurable CGRAs, it is only possible to

achieve high utilization for all loops in an application if all those loops have similar

complexity and size, or if they can be made so with loop transformations, and if the

iterations are not dependent on each other through long-latency dependency cycles

(as was the case in Fig. 5 ) . Dynamically reconfigurable CGRAs, by contrast, can

also achieve high average utilization over loops of varying sizes and complexities,

and with inter-iteration dependencies. That way dynamically reconfigurable CGRAs

can achieve higher energy efficiency in the data path, at the expense of higher energy

Signal Processing Systems

Search WWH ::

Custom Search

Home