Coarse-Grained Reconfigurable Array Architectures - Signal Processing Systems

Digital Signal Processing Reference

In-Depth Information

consumption in the control path. Which design option is the best depends also on the

process technology used, and in particular on the ability to perform clock or power

gating and on the ratio between active and passive power (a.k.a. leakage).

3.2.2

Scheduling and Issuing

Both with dynamic and with static reconfigurability, the execution of operations and

of data transfers needs to be controlled. This can be done statically in a compiler,

similar to the way in which operations from static code schedules are scheduled and

issued on VLIW processors [ 24 ] , or dynamically, similar to the way in which out-

of-order processors issue instructions when their operands become available [ 66 ] .

Many possible combinations of static and dynamic reconfiguration and of static and

dynamic scheduling exist.

A first class consists of dynamically scheduled, dynamically reconfigurable

CGRAs like the TRIPS architecture [ 28 , 63 ] . For this architecture, the compiler

determines on which IS each operation is to be executed and over which connections

data is to be transferred from one IS to another. So the compiler performs placement

and routing. All scheduling (including the reconfiguration) is dynamic, however, as

in regular out-of-order superscalar processors [ 66 ] . TRIPS mainly targets general-

purpose applications, in which unpredictable control flow makes the generation of

high-quality static schedules difficult if not impossible. Such applications most often

provide relatively limited ILP, for which large arrays of computational resources are

not efficient. So instead a small, dynamically reconfigurable array is used, for which

the run-time cost of dynamic reconfiguration and scheduling is acceptable.

A second class of dynamically reconfigurable architectures avoids the overhead

of dynamic scheduling by supporting VLIW-like static scheduling [ 24 ] . Instead of

doing the scheduling in hardware where the scheduling logic then burns power,

the scheduling for ADRES, MorphoSys and Silicon Hive architectures is done by a

compiler. Compilers can do this efficiently for loops with regular, predictable behav-

ior and high ILP, as found in many DSP applications. As for VLIW architectures,

software pipelining [ 39 , 61 ] is a very important to expose the ILP in software kernels,

so most compiler techniques [ 20 , 22 , 25 , 48 , 52 , 54 , 55 ] for statically scheduled

CGRAs implement some form of software pipelining.

A final class of CGRAs are the statically reconfigurable, dynamically scheduled

architectures, such as KressArray or PACT (neglecting the time-consuming partial

reconfigurability of the PACT). The compiler performs placement and routing, and

the code execution progress is guided by tokens or event signals that are passed

along with data. Thus the control is dynamic, and it is distributed over the token

or event path, similar to the way in which transport-triggered architectures [ 17 ]

operate. These statically reconfigurable CGRAs do not require software pipelining

techniques because there is no temporal mapping. Instead the spatial mapping

and the control implemented in the tokens or event signals implement a hardware

pipeline.

Search WWH ::

Custom Search

Home