Digital Signal Processing Reference
In-Depth Information
of CGRA architectures. For that reason, it is very difficult to compare the efficiency
(compilation time) and effectiveness (quality of generated code) of the different
techniques.
The most widely applicable static scheduling techniques use different forms of
Modulo Resource Routing Graphs (MRRGs). RRGs are time-space graphs, in which
all resources (space dimension) are modeled with vertices. There is one such vertex
per resource per cycle (time dimension) in the schedule being generated. Directed
edges model the connections over which data values can flow from resource to
resource. The schedule, placement, and routing problem then becomes a problem
of mapping the Data Dependence Graph (DDG) of some loop body on the RRG.
Scheduling refers to finding the right cycle to perform an operation in the schedule,
placement refers to finding the right IS in that cycle, and routing refers to finding
connections to transfer data from producing operations to consuming operations. In
the case of a modulo scheduler, the modulo constraint is enforced by modeling all
resource usage in the modulo time domain. This is done by modeling the appropriate
modulo reservation tables [ 61 ] on top of the RRG, hence the name MRRG.
The granularity of its vertices depends on the precise compiler algorithm. One
modulo graph embedding algorithm [ 54 ] models whole ISs or whole RFs with
single vertices, whereas the simulated-annealing technique in the DRESC [ 20 , 48 ]
compiler that targets ADRES instances models individual ports to ISs and RFs as
separate vertices. Typically, fewer nodes that model larger components lead to faster
compilation because the graph mapping problem operates on a smaller graph, but
also to lower code quality because some combinations of resource usage cannot be
modeled precisely.
Several types of code schedulers exist. In DRESC, simulated annealing is used to
explore different placement and routing options until a valid placement and routing
of all operations and data dependencies is found. The cost function used during the
simulated annealing is based on the total routing cost, i.e., the combined resource
consumption of all placed operations and of all routed data dependencies. In this
technique, a huge number of possible routes is evaluated, as a result of which
the technique is very slow. Other CGRA scheduling techniques [ 25 , 52 , 54 , 55 ]
operate much more like (modulo) list schedulers [ 24 ] . In the best list-based CGRA
schedulers, the used heuristics also rely heavily on routing costs. However, as
these techniques try to optimize routes of individual data dependencies in a greedy
manner, with little backtracking, they explore orders of magnitude fewer routes.
Those schedulers are therefore one to two orders of magnitude faster than the
simulated-annealing approach. They are also less effective, but for at least some
architectures [ 52 ] , the difference in generated code quality is very small.
As an alternative to simulated-annealing, Lee et al. use quantum-inspired
evolutionary algorithms [ 41 ] . In their compiler, they use this algorithm to improve
a suboptimal schedule already found by a list scheduler. Their evaluation is limited
to rather small loops and a CGRA with limited dynamic reconfigurability, so at this
point it is impossible to compare their algorithm's effectiveness and efficiency to
those of the other algorithms.
Search WWH ::




Custom Search