Coarse-Grained Reconfigurable Array Architectures - Signal Processing Systems

Digital Signal Processing Reference

In-Depth Information

of CGRA architectures. For that reason, it is very difficult to compare the efficiency

(compilation time) and effectiveness (quality of generated code) of the different

techniques.

The most widely applicable static scheduling techniques use different forms of

Modulo Resource Routing Graphs (MRRGs). RRGs are time-space graphs, in which

all resources (space dimension) are modeled with vertices. There is one such vertex

per resource per cycle (time dimension) in the schedule being generated. Directed

edges model the connections over which data values can flow from resource to

resource. The schedule, placement, and routing problem then becomes a problem

of mapping the Data Dependence Graph (DDG) of some loop body on the RRG.

Scheduling refers to finding the right cycle to perform an operation in the schedule,

placement refers to finding the right IS in that cycle, and routing refers to finding

connections to transfer data from producing operations to consuming operations. In

the case of a modulo scheduler, the modulo constraint is enforced by modeling all

resource usage in the modulo time domain. This is done by modeling the appropriate

modulo reservation tables [ 61 ] on top of the RRG, hence the name MRRG.

The granularity of its vertices depends on the precise compiler algorithm. One

modulo graph embedding algorithm [ 54 ] models whole ISs or whole RFs with

single vertices, whereas the simulated-annealing technique in the DRESC [ 20 , 48 ]

compiler that targets ADRES instances models individual ports to ISs and RFs as

separate vertices. Typically, fewer nodes that model larger components lead to faster

compilation because the graph mapping problem operates on a smaller graph, but

also to lower code quality because some combinations of resource usage cannot be

modeled precisely.

Several types of code schedulers exist. In DRESC, simulated annealing is used to

explore different placement and routing options until a valid placement and routing

of all operations and data dependencies is found. The cost function used during the

simulated annealing is based on the total routing cost, i.e., the combined resource

consumption of all placed operations and of all routed data dependencies. In this

technique, a huge number of possible routes is evaluated, as a result of which

the technique is very slow. Other CGRA scheduling techniques [ 25 , 52 , 54 , 55 ]

operate much more like (modulo) list schedulers [ 24 ] . In the best list-based CGRA

schedulers, the used heuristics also rely heavily on routing costs. However, as

these techniques try to optimize routes of individual data dependencies in a greedy

manner, with little backtracking, they explore orders of magnitude fewer routes.

Those schedulers are therefore one to two orders of magnitude faster than the

simulated-annealing approach. They are also less effective, but for at least some

architectures [ 52 ] , the difference in generated code quality is very small.

As an alternative to simulated-annealing, Lee et al. use quantum-inspired

evolutionary algorithms [ 41 ] . In their compiler, they use this algorithm to improve

a suboptimal schedule already found by a list scheduler. Their evaluation is limited

to rather small loops and a CGRA with limited dynamic reconfigurability, so at this

point it is impossible to compare their algorithm's effectiveness and efficiency to

those of the other algorithms.

Signal Processing Systems

Search WWH ::

Custom Search

Home