Digital Signal Processing Reference
In-Depth Information
3.3.2
Register Files
Compilers for CGRA architectures place operations in ISs, thus also scheduling
them, and route the data flow over the connections between the ISs. Those
connections may be direct connections, or latched connections, or even connections
that go through RFs. Therefore most CGRA compilers treat RFs not as temporary
storage, but as interconnects that can span multiple cycles. Thus the RFs can be
treated uniformly with the connections during routing. A direct consequence of this
compiler approach is that the design space freedom of interconnects extends to the
placement of RFs in between ISs. During the Design Space Exploration (DSE) for a
specific CGRA instance in a CGRA design template such as the ADRES or Silicon
Hive templates, both the real connections and the RFs have to be explored, and that
has to be done together. Just like the number of real interconnect wires and their
topology, the size of RFs, their location and their number of ports then contribute to
the interconnectivity of the ISs. We refer to [ 11 , 47 ] for DSEs that study both RFs
and interconnects.
Besides their size and ports, another important aspect is that RFs can be
rotating [ 62 ] . The power and delay overhead of rotation is very small in distributed
RFs, simply because these RFs are small themselves. Still they can provide an
important functionality. Consider a dynamically reconfigurable CGRA on which a
loop is executed that iterates over x configurations, i.e., each iteration takes x cycles.
That means that for a write port of an RF, every x cycles the same address bits get
fetched from the configuration memory to configure the address set at that port. In
other words, every x cycles a new value is being written into the register specified by
that same address. This implies that values can stay in the same register for at most
x cycles; then they are overwritten by a new value from the next iteration. In many
loops, however, some values have a life time that spans more than x cycles, because
it spans multiple loop iterations. To avoid having to insert additional data transfers
in the loop schedules, rotating registers can be used. At the end of every iteration of
the loop, all values in rotating registers rotate into another register to make sure that
old values are copied to where they are not overwritten by newer values.
3.3.3
Predicates, Events and Tokens
To complete this overview on CGRA interconnects, we want to point out that it can
be very useful to have interconnects of different widths. The data path width can be
as small as 8 bits or as wide as 64 or 128 bits. The latter widths are typically used to
pass SIMD data. However, as not all data is SIMD data, not all paths need to have the
full width. Moreover, most CGRA designs and the code mapped onto them feature
signals that are only one or a few bits wide, such as predicates or events or tokens.
Using the full-width datapath for these narrow signals wastes resources. Hence it is
often useful to add a second, narrow datapath for control signals like tokens or events
and for predicates. How dense that narrow datapath has to be, depends on the type of
loops one wants to run on the CGRA. For example, multimedia coding and decoding
Search WWH ::




Custom Search