Architecture and Instruction Set of the C6x Processor - Digital Signal Processing and Applications with the C6713 and C6416 DSK

Digital Signal Processing Reference

In-Depth Information

The first stage, prolog, contains instructions to build the second-stage loop cycle, and

the epilog stage (last stage) contains instructions to finish all loop iterations. Soft-

ware pipelining is used by the compiler when the optimization option level -o2 or

-o3 is invoked. The most efficient software pipelined code has loop trip counters

that count down: for example,

for (i = N; i != 0; i--)

A dot product example with word-wide hand-coded pipelined code results in ( N /2)

+

8 cycles to obtain the sum of two arrays, with N numbers in each array. This trans-

lates to 108 cycles to find the sum of products of 200 numbers, as illustrated in

Chapter 8. This efficiency is obtained using instructions such as LDW to load a

32-bit word and multiplying the lower and higher 16-bit numbers separately with

the two instructions mpy and mpyh , respectively.

Removing the epilog section can also reduce the code size. The available options

- msn ( n

0, 1, 2) directs the compiler to favor code size reduction over perfor-

mance. Hand-coded software pipelined code can be produced by first drawing a

dependency graph and setting up a scheduling table [8]. In Chapter 8 we discuss

software pipelining in conjunction with code efficiency.

=

3.20 CONSTRAINTS

3.20.1 Memory Constraints

Internal memory is arranged through various banks of memory so that loads and

stores can occur simultaneously. Since each bank of memory is single-ported, only

one access to each bank is performed per cycle. Two memory accesses per cycle can

be performed if they do not access the same bank of memory. If multiple accesses

are performed to the same bank of memory (within the same space), the pipeline

will stall. This causes additional cycles for execution to complete.

3.20.2 Cross-Path Constraints

Since there is one cross-path in each side of the two data paths, there can be at most

two instructions per cycle using cross-paths. The following code segment is valid

since both available cross-paths are used:

ADD .L1x A1,B1,A0

|| MPY .M2x A2,B2,B3

whereas the following is not valid since one cross-path is used for both instructions:

ADD .L1x A1,B1,A0

|| MPY .M1x A2,B2,A3

Search WWH ::

Custom Search

Home