Digital Signal Processing Reference
In-Depth Information
to 10 phases (not all execution phases are shown in Table 3.3). We are assuming that
each FP contains one EP.
For example, at cycle 7, while the instructions in the first FP are in the first exe-
cution phase E1 (which may be the only one), the instructions in the second FP are
in the decoding phase, the instructions in the third FP are in the dispatching phase,
and so on. All seven instructions are proceeding through the various phases. There-
fore, at cycle 7, “the pipeline is full.”
Most instructions have one execute phase. Instructions such as multiply ( MPY ),
load ( LDH/LDW ), and branch ( B ) take two, five, and six phases, respectively. Addi-
tional execute phases are associated with floating-point and double-precision types
of instructions, which can take up to 10 phases. For example, the double-precision
multiply operation ( MPYDP ), available on the C67x, has nine delay slots, so that the
execution phase takes a total of 10 phases.
The functional unit latency , which represents the number of cycles that an instruc-
tion ties up a functional unit, is 1 for all instructions except double-precision instruc-
tions, available with the floating-point C67x. Functional unit latency is different from
a delay slot. For example, the instruction MPYDP has four functional unit latencies
but nine delay slots. This implies that no other instruction can use the associated
multiply functional unit for four cycles. A store has no delay slot but finishes its exe-
cution in the third execution phase of the pipeline.
If the outcome of a multiply instruction such as MPY is used by a subsequent
instruction, a NOP (no operation) must be inserted after the MPY instruction for the
pipelining to operate properly. Four or five NOP s are to be inserted in case an instruc-
tion uses the outcome of a load or a branch instruction, respectively.
3.6 REGISTERS
Two sets of register files, each set with 16 registers, are available: register file A (A0
through A15) and register file B (B0 through B15). Registers A0, A1, B0, B1, and
B2 are used as conditional registers. Registers A4 through A7 and B4 through B7
are used for circular addressing. Registers A0 through A9 and B0 through B9
(except B3) are temporary registers. Any of the registers A10 through A15 and B10
through B15 used are saved and later restored before returning from a subroutine.
A 40-bit data value can be contained across a register pair. The 32 least signifi-
cant bits (LSBs) are stored in the even register (e.g., A2), and the remaining 8 bits
are stored in the 8 LSBs of the next-upper (odd) register (A3). A similar scheme is
used to hold a 64-bit double-precision value within a pair of registers (even and
odd).
These 32 registers are considered general-purpose registers. Several special-
purpose registers are also available for control and interrupts: for example, the
address mode register (AMR) used for circular addressing and interrupt control
registers, as shown in Appendix B.
Search WWH ::




Custom Search