Digital Signal Processing Reference
In-Depth Information
Fig. 9 Interleaved internal
data memory with four
memory banks, each 16 bit
(2 bytes) wide
Bank 0
Bank 1
Bank 2
Bank 3
byte 0
byte 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
there can exist several optimal schedules. For instance, another one for this example
is reported by Leupers [ 65 ] , which was computed by a simulated annealing based
heuristic.
Branch instructions on the 'C62x, which execute on the .S units, are delayed
branches with a latency of six clock cycles, thus five delay slots are exposed. If
two branches execute in the same issue packet (on .S1 and .S2 in parallel), control
branches to the target for which the branch condition evaluates to true. This can be
used to realize three-way branches. If both branch conditions evaluate to true, the
behavior is undefined.
All 'C62x instructions can be predicated. The four most significant bits in the
opcode form a condition field , where the first three bits specify the condition register
tested, and the fourth bit specifies whether to test for equality or non-equality of that
register with zero. Registers A1, A2, B0, B1 and B2 can serve as condition registers.
The condition field code 0000 denotes unconditional execution.
Usually, branch targets will be at the beginning of an issue packet. However,
branch targets can be any word address in instruction memory and thereby any
instruction, which may also be in the middle of an issue packet. In that case, the
instructions in that issue packet that appear in the program text before the branch
target address will not take effect (are treated as NOP s).
Most 'C62x processor types use interleaved memory banks for the internal
(on-chip) data memory. In most cases, data memory is organized in four 16-bit
wide memory banks, and byte addresses are mapped cyclically across these (see
Fig. 9 ) . Each bank is single-ported memory, thus only one access is possible per
clock cycle. If two load or store instructions try to access addresses in the same bank
in the same clock cycle, the processor stalls for one cycle to serialize the accesses.
For avoiding such delays, it is useful to know statically the alignment of addresses
to be accessed in parallel, and make sure that these end up in different memory
banks. Note also that load-word ( LDW ) and store-word ( STW ) instructions, which
access 32-bit data, access two neighbored banks simultaneously. Word addresses
must be aligned on word boundaries, i.e., the two least significant address bits are
zero. Halfword addresses must be aligned on halfword boundaries.
TI 'C62x and 'C64x processors are fixed point DSP processors, where the 'C64x
processors have architecture extensions that include, for instance, further support
for SIMD processing (such as four-way 8 bit SIMD addition etc., four-way 16
×
16
bit multiply and eight-way 8
32
multiply and complex multiply, compact (16-bit) instructions that can be mixed with
32-bit instructions [ 50 ] , hardware support for software pipelining of loops, and more
(2
×
8 bit multiply), further instructions such as 32
×
×
32) registers. The TI 'C67x family also supports floatingpoint computations.
 
 
Search WWH ::




Custom Search