Hardware Reference
In-Depth Information
Table 6.3 Improved performance by FE-GA for AAC encoding processes
Process
On CPU
On FE-GA a
Speedup
Filter bank and M/S stereo
2,400 K cycles
100 K cycles
24.0×
Quantization
240 K cycles
31 K cycles
7.7×
a Data transfer time by the DTU is included
6.1.3
Process Mapping on FE-GA
The encoder program was investigated thoroughly to confirm its suitability for
processing on the FE-GA at every encoding stage. The filter bank is a band-pass
filter separating the input audio signal into several components of frequency sub-
bands. The calculation of the filter bank is composed of additions to and multiplica-
tions of the streaming data, which is suitable for processing on the FE-GA. The M/S
stereo extracts parts of the frequency sub-bands that appear in both left and right
channels. The calculation consists of additions to and subtractions of the left and
right sub-bands, and it is thus implemented on the FE-GA. Quantization constrains
the output value of the filter bank to a discrete set of values in accordance with the
specified bit rate. The calculation is a power of 3/4 to the data. The evaluated pro-
gram contains a table reference, which is implemented on the FE-GA. Huffman
coding assigns shorter coding symbols to more frequently appearing bit strings for
compression. In the implementation, quantization and Huffman coding iterate after
the step value for quantization is increased, until the amount of encoded data satisfies
a given bit rate. Since the coding length of bit strings is not fixed, it is difficult to
improve the performance with the FE-GA, and thus, a CPU is used for the Huffman
coding. Bit-stream generation arranges coded symbols in compliance with the AAC
stream format. A CPU is used to generate bit streams.
We developed the configurations of the FE-GAs for the filter bank, M/S stereo,
and quantization for the evaluation. The configurations for the filter bank and M/S
stereo were merged because the M/S stereo continuously follows the filter bank
process. The execution cycles were measured both on an FE-GA and on a single
CPU, as indicated in Table 6.3 . Note that the FE-GA cycles are converted to CPU
cycles since the FE-GAs operate at 300 MHz, which is half the CPU's cycles at
600 MHz. Introducing FE-GAs to the merged filter bank and M/S stereo and quan-
tization yields 24- and 7.7-fold speedups in performance against sequential execu-
tion on a CPU.
6.1.4
Data Transfer Optimization with DTU
Each processor core has a data transfer unit (DTU) attached to an internal bus con-
nected to the local memories. The DTU simultaneously transfers data between local
memories on different processor cores, between a local memory and on-chip CSM or
 
Search WWH ::




Custom Search