Digital Signal Processing Reference
In-Depth Information
4.2
ADRES Design Space Exploration
In this part of our case study, we discuss the importance and the opportunities for
DSE within the ADRES template. First, we discuss some concrete ADRES in-
stances that have been used for extensive experimentation, including the fabrication
of working silicon samples. These examples demonstrate that very power-efficient
CGRAs can be designed for specific application domains.
Afterwards, we will show some examples of DSE results with respect to some of
4.2.1
Example ADRES Instances
During the development of the ADRES tool chain and design, two main ADRES
instances have been worked out. One was designed for multimedia applications
half non-rotating) that is shared with a unified three-issue VLIW processor that
executes non-loop code. Thus this shared RF has six read ports and three write
ports. Both CGRAs feature 16 FUs, of which four can access the memory (that
consists of four single-ported banks) through a queue mechanism that can resolve
bank conflicts. Most operations have latency one, with the exception of loads, stores,
and multiplications. One important difference between the two CGRAs relates to
As the local RFs are only buffered at their input, pipelining registers need to be
inserted in the paths to and from the FUs in order to obtain the desired frequency
targets as indicated in the table. The pipeline latches shown in Table
1
hence directly
sets and the target frequencies are different in both application domains, the SDR
CGRA has one more pipeline register than the multimedia CGRA, and they are
located at different places in the design.
Traditionally, in VLIWs or in out-of-order superscalar processors, deeper pipelin-
ing results in higher frequencies but also in lower IPCs because of larger branch
To illustrate this, Table
3
includes IPCs obtained when generating code for both
CGRAs with and without the pipelining latches.
The benchmarks mapped onto the multimedia ADRES CGRA are a H.264AVC
video decoder, a wavelet-based video decoder, an MPEG4 video coder, a black-and-
white TIFF image filter, and a SHA-2 encryption algorithm. For each application
at most the ten hottest inner loops are included in the table. For the SDR ADRES
CGRA, we selected two baseband modem benchmarks: one WLAN MIMO Channel
Estimation and one that implements the remainder of a WLAN SISO receiver. All