Hardware Reference
In-Depth Information
CPU#0
SH
Core
FE-GA#0
DTU
ALU/MLT array
FE-
GA
#1
CPU
#1
LM
com.
lists
LM
Bus
I/F
Bus
I/F
data
data
flag
Fig. 6.5 Implementation of DTU on evaluated chip and its operation
Figure 6.5 outlines the DTU implementation on the evaluated chip with an example
diagram of its operation. Transfer lists, data, and flags are placed in the local mem-
ory (LM). The example shows that the DTU interprets the command on CPU#0's
LM and transfers data on the LM in CPU#0's LM to the FE-GA#0's LM.
In order to maximize the performance of the encoding process, the on-chip and off-
chip memories are used as follows. The encoding is done frame by frame. Input PCM
data and output AAC streams are stored in the off-chip main memory (SDRAM). Before
every frame is processed, the PCM frame data are transferred to the URAM of a target
CPU. Intermediately generated data are also placed on the URAM. For processes on an
FE-GA, data are transferred from the URAM to the local memory of a target FE-GA
before they are executed, and processed data stored on the local memory are transferred
to the URAM of the target CPU after they are executed.
6.1.5
Performance Evaluation on CPU and FE-GA
The processing time for AAC encoding was evaluated for the following data trans-
fer methods: by a CPU, by a DMAC, by a DTU without transfer lists, and by a DTU
with the lists on a configuration of one CPU and one FE-GA. The encoding options
and conditions are described in Table 6.2 with music-1 adopted for the evaluation.
Figure 6.6 shows the improved performance with various data transfer methods as a
result. Encoding on one CPU resulted in 58.2 s of execution time. The encoding
speedup rate is 3.3, which is calculated from the length of input music, which is
192 s. By introducing an FE-GA with data transferred by a CPU, the encoding time
is 14.1 s, which is 13.6 times the encoding speed. The FE-GA contributes to greater
speedups against the CPU, which had speedups of 4.1. Next, encoding with DMAC
transfers resulted in an encoding time of 10.1 s, which is 20.1 times the encoding
speed. Furthermore, with DTU transfers without transfer lists, the encoding time
was 7.9 s, which is 24.2 times the encoding speed. Finally, evaluation with DTU
transfers operated by transfer lists resulted in an encoding time of 7.5 s, which is
25.6 times the encoding speed.
Search WWH ::




Custom Search