Hardware Reference
In-Depth Information
Right
Left channel
Time [K Cycles]
Average iterations 2.03
Initialization
AAC stream
generation
Target bit
calculation
Huffman
coding
DTU transfer
27
11
55
55
15
CPU
1
㨯㨯㨯
26
19
19
5
68
7
FE-GA
7
Quantization
Filter bank
132-K cycles
185 (Average)
185
502-K cycles: 27.1x encoding
Fig. 6.9
Trace Gantt chart of one-frame encoding on CPUx1 + FE-GAx1
speedup was 8.0 and the power consumption was 1.36 W on the homogeneous
multicore with two CPUs. The encoding speedup was 27.1 and the power consump-
tion was 1.22 W on the heterogeneous multicore with one CPU and one FE-GA.
Finally, the encoding speedup was 54.1 and the power consumption was 1.46 W on
two CPUs and two FE-GAs. The heterogeneous multicore configuration outper-
formed the homogeneous multicores. Even though the power consumption increases
as the number of processor cores is increased, the speedup in encoding is much
faster. To evaluate the power-performance efficiency of the heterogeneous multi-
core configuration, the index of encoding speed per W [xEnc/W] was calculated for
all the evaluated configurations as listed in the bottom of Fig. 6.8 . Sequential execu-
tion on a single CPU resulted in 3.4 xEnc/W. Parallel execution on a configuration
of two CPUs and two FE-GAs resulted in 37.1 xEnc/W, which is 10.9 times better
power-performance ef fi ciency.
Figure 6.9 is a Gantt chart of one-frame encoding on one CPU and one FE-GA.
The filter bank and quantization were processed on the FE-GA, and DTU data transfers
were performed between executions on the CPU and the FE-GA.
6.2
Real-Time Image Recognition
6.2.1
MX Library
To extract the maximum parallel processing performance of MX, a dedicated MX
library consisting of more than 100 microcode functions is prepared. As shown in
Fig. 6.10 , these library functions are stored in the MX controller. The MX system
 
Search WWH ::




Custom Search