General-Purpose DSP Processors - Signal Processing Systems

Digital Signal Processing Reference

In-Depth Information

instruction set, i.e., there are instructions to control the usage of special function

units, while co-processors are mainly visible as memory or I/O mapped peripheral

devices.

4

Memory Architecture

In order to obtain high performance in DSP, it is not enough to have high-

performance arithmetic units in the data path. It is essential to be able to feed

operands from memory to the data path; the memory bandwidth should match the

performance of the data path. Quite often algorithms are applied to a large number

of data structures such that there is not enough registers for all the operands, thus

operands have to be obtained to memory.

In general, low-cost microprocessors use von Neumann architecture where

instructions and data are both placed to the same storage structure implying that

instruction fetches and data accesses are interleaved over common resources as

illustrated in Fig. 9 . This implies that several cycles are needed to complete, e.g.,

MAC operation needed for implementing one FIR tap in ( 1 ) , which requires three

memory accesses: fetching of instruction, coefficient, and sample.

Higher performance can be expected if memory accesses could be performed

simultaneously. In first DSP processors, memory bandwidth was improved by

exploiting Harvard architecture where instructions and data are stored in different

independent memories as shown in Fig. 10 a . This implies that while operands for

current instruction are accessed, the next instruction can already be fetched. This

approach doubles the memory bandwidth when one operand instructions are used

as clearly seen by comparing the timing diagrams in Figs. 9 and 10 a .

The speedup of two operand instructions is not doubled, thus one of the first

modifications was to use repeat buffer where an instruction can be stored to avoid

fetch from the memory thus the program bus is free for the data access. Such an

approach is applicable mainly only in loops. In the first iteration, the instruction

( I 1 in Fig. 10 b ) is fetched from program memory, thus the program bus is reserved

and operand accesses are performed in the next cycle ( O 10 and O 11 ). In the next

iterations, the instruction is fetched from the repeat buffer, thus the program bus is

free for operand access and two operand accesses over two buses can be performed

in parallel ( O 20 and O 21 etc.).

Address bus

Memory

Instructions

Operands

CPU

Data bus

Fig. 9 von Neumann

architecture. I n : instruction

access, O n : Operand access

Data Bus

I 0

O 0

I 1

O 1

One-Operand Instruction Two-Operand Instruction

Signal Processing Systems

Search WWH ::

Custom Search

Home