Coarse-Grained Reconfigurable Array Architectures - Signal Processing Systems

Digital Signal Processing Reference

In-Depth Information

3.3.2

Register Files

Compilers for CGRA architectures place operations in ISs, thus also scheduling

them, and route the data flow over the connections between the ISs. Those

connections may be direct connections, or latched connections, or even connections

that go through RFs. Therefore most CGRA compilers treat RFs not as temporary

storage, but as interconnects that can span multiple cycles. Thus the RFs can be

treated uniformly with the connections during routing. A direct consequence of this

compiler approach is that the design space freedom of interconnects extends to the

placement of RFs in between ISs. During the Design Space Exploration (DSE) for a

specific CGRA instance in a CGRA design template such as the ADRES or Silicon

Hive templates, both the real connections and the RFs have to be explored, and that

has to be done together. Just like the number of real interconnect wires and their

topology, the size of RFs, their location and their number of ports then contribute to

the interconnectivity of the ISs. We refer to [ 11 , 47 ] for DSEs that study both RFs

and interconnects.

Besides their size and ports, another important aspect is that RFs can be

rotating [ 62 ] . The power and delay overhead of rotation is very small in distributed

RFs, simply because these RFs are small themselves. Still they can provide an

important functionality. Consider a dynamically reconfigurable CGRA on which a

loop is executed that iterates over x configurations, i.e., each iteration takes x cycles.

That means that for a write port of an RF, every x cycles the same address bits get

fetched from the configuration memory to configure the address set at that port. In

other words, every x cycles a new value is being written into the register specified by

that same address. This implies that values can stay in the same register for at most

x cycles; then they are overwritten by a new value from the next iteration. In many

loops, however, some values have a life time that spans more than x cycles, because

it spans multiple loop iterations. To avoid having to insert additional data transfers

in the loop schedules, rotating registers can be used. At the end of every iteration of

the loop, all values in rotating registers rotate into another register to make sure that

old values are copied to where they are not overwritten by newer values.

3.3.3

Predicates, Events and Tokens

To complete this overview on CGRA interconnects, we want to point out that it can

be very useful to have interconnects of different widths. The data path width can be

as small as 8 bits or as wide as 64 or 128 bits. The latter widths are typically used to

pass SIMD data. However, as not all data is SIMD data, not all paths need to have the

full width. Moreover, most CGRA designs and the code mapped onto them feature

signals that are only one or a few bits wide, such as predicates or events or tokens.

Using the full-width datapath for these narrow signals wastes resources. Hence it is

often useful to add a second, narrow datapath for control signals like tokens or events

and for predicates. How dense that narrow datapath has to be, depends on the type of

loops one wants to run on the CGRA. For example, multimedia coding and decoding

Signal Processing Systems

Search WWH ::

Custom Search

Home