Digital Signal Processing Reference
In-Depth Information
instruction set, i.e., there are instructions to control the usage of special function
units, while co-processors are mainly visible as memory or I/O mapped peripheral
devices.
4
Memory Architecture
In order to obtain high performance in DSP, it is not enough to have high-
performance arithmetic units in the data path. It is essential to be able to feed
operands from memory to the data path; the memory bandwidth should match the
performance of the data path. Quite often algorithms are applied to a large number
of data structures such that there is not enough registers for all the operands, thus
operands have to be obtained to memory.
In general, low-cost microprocessors use von Neumann architecture where
instructions and data are both placed to the same storage structure implying that
instruction fetches and data accesses are interleaved over common resources as
illustrated in Fig. 9 . This implies that several cycles are needed to complete, e.g.,
MAC operation needed for implementing one FIR tap in ( 1 ) , which requires three
memory accesses: fetching of instruction, coefficient, and sample.
Higher performance can be expected if memory accesses could be performed
simultaneously. In first DSP processors, memory bandwidth was improved by
exploiting Harvard architecture where instructions and data are stored in different
independent memories as shown in Fig. 10 a . This implies that while operands for
current instruction are accessed, the next instruction can already be fetched. This
approach doubles the memory bandwidth when one operand instructions are used
as clearly seen by comparing the timing diagrams in Figs. 9 and 10 a .
The speedup of two operand instructions is not doubled, thus one of the first
modifications was to use repeat buffer where an instruction can be stored to avoid
fetch from the memory thus the program bus is free for the data access. Such an
approach is applicable mainly only in loops. In the first iteration, the instruction
( I 1 in Fig. 10 b ) is fetched from program memory, thus the program bus is reserved
and operand accesses are performed in the next cycle ( O 10 and O 11 ). In the next
iterations, the instruction is fetched from the repeat buffer, thus the program bus is
free for operand access and two operand accesses over two buses can be performed
in parallel ( O 20 and O 21 etc.).
Address bus
Memory
Instructions
Operands
CPU
Data bus
Fig. 9 von Neumann
architecture. I n : instruction
access, O n : Operand access
Data Bus
I 0
O 0
I 1
O 1
O 1
One-Operand Instruction Two-Operand Instruction
 
 
Search WWH ::




Custom Search