Digital Signal Processing Reference
In-Depth Information
ALUs and multipliers with direct connections between them and their local RFs.
These direct connections within each IS can take care of a lot of data transfers,
thus freeing time on the shared bus-based interconnect that connects all ISs. Thus,
the local interconnect within each IS compensates for the lack of a scaling global
interconnect. One advantage of this clustering approach is that the compiler can be
tuned specifically for this combination of local and global connections and for the
fact that it does not need to support heterogeneous ISs. Whether or not this type of
design is more power-efficient than that of CGRAs with more design freedom and
potentially more heterogeneity is unclear at this point in time. At least, we know
of no studies from which, e.g., utilization numbers can be derived that allow us to
compare the two approaches.
Some architectures combine the flexibility of heterogeneous ADRES ISs with
clustering. For example, the CGRA Express [ 57 ] and the expression-grained
reconfigurable array (EGRA) [ 3 ] architectures feature heterogeneous clusters of
relatively simple, fast ALUs. Within the clusters, those ALUs are chained by means
of a limited number of latchless connections. Through careful design, the delay
of those chains is comparable to the delay of other, more complex ISs on the
CGRA that bound the clock frequency. So the chaining does not effect the clock
frequency. It does allow, however, to execute multiple dependent operations within
one clock cycle. It can therefore improve performance significantly. As the chains
and clusters are composed of existing components such as ISs, buses, multiplexers
and connections, these clustered designs do not really extend the design space
of non-clustered CGRAs like ADRES. Still it can be useful to treat clusters as
a separate design level in between the IS component level and the whole array
architecture level, for example because it allows code generation algorithms in
compilers to be tuned for there existence [ 57 ] .
A specific type of clustering was proposed to handle floating-point arithmetic.
While most research on CGRAs is limited to integer and fixed-point arithmetic, Lee
et al. proposed to cluster two ISs to handle floating-point data [ 41 ] . In their design,
both ISs in the cluster can operate independently on integer or fixed-point data, but
they can also cooperate by means of a special direct interconnect between them.
When they cooperate, one IS in the cluster consumes and handles the mantissas,
while the other IS consumes and produces the exponents. As a single ISs can thus be
used for both floating-point and integer computations, Lee et al. are able to achieve
high utilization for integer applications, floating-point applications, as well as mixed
applications.
With respect to utilization, it is clear that the designs of Fig. 7 a , b will only be
utilized well if a lot of multiplications need to be performed. Otherwise, the area-
consuming multipliers remain unused. To work around this problem, the sharing
of large resources such as multipliers between ISs has been proposed in the RSPA
CGRA design [ 33 ] . Figure 7 d depicts one row of ISs that do not contain multipliers
internally, but that are connected to a shared multiplier through switches and a
shared bus. The advantage of this design, compared to an ADRES design in which
each row features three pure ALU ISs and one ALU+MULT IS, is that this design
allows the compiler to schedule multiplications in all ISs (albeit only one per cycle),
Search WWH ::




Custom Search