Digital Signal Processing Reference
In-Depth Information
Control hazards denote the fact that branch instructions may disrupt the linear
fetch-decode-execute pipeline flow. Branch instructions are detected only in the
decoding phase and the branch target may, in the case of conditional branches, be
known even later during execution. If subsequent instructions have been fetched,
decoded and partly executed on the “wrong” control flow branch when the branch
is detected or the branch target is known, the effect of these instructions must be
rolled back and the pipeline must restart from the branch target. This implies a non-
zero delay in execution that may differ depending on the type of branch instruction
(nonconditional branch, conditional branch taken as expected, or conditional branch
not taken as expected). There are basically two possibilities how processors manage
branch delays:
1. Delayed branch : The branch instruction semantics is re-defined to take its effect
on the program counter only after a certain number of delay time slots. It is
a task for global instruction scheduling (see Sect. 7 ) to try filling these branch
delay slots with useful instructions that need to be executed anyway but do not
influence the branch condition. If no other instructions can be moved to a branch
delay slot, it has to be filled with a NOP instruction as placeholder.
2. Pipeline stall : The entire processor pipeline is frozen until the first instruction
word has been loaded from the branch target. The delay is not explicit in the
program code and may vary depending on the branch instruction type.
In particular, conditional branches have a detrimental effect on processor through-
put. For this reason, hardware features and code generation techniques that allow
to reduce the need for (conditional) branches are important. The most prominent
one is predicated execution : each instruction takes an additional operand, a boolean
predicate, which may be a constant or a variable in a predicate register. If the
predicate evaluates to true, the instruction executes as usual. If it evaluates to false,
the effect of that instruction is rolled back such that it behaves like a NOP instruction.
1.5
Hardware Loops
Many innermost loops in digital signal processing applications have a fixed number
of iterations and a fixed-length loop body consisting of straight-line code. Some
DSP processors therefore support a hardware loop construct. A special hardware
loop setup instruction at the loop entry initializes an iteration count register and also
specifies the number of subsequent instructions that are supposed to form the loop
body. The iteration count register is advanced automatically after every execution of
the loop body; no separate add instruction is necessary for that purpose. A backward
branch instruction from the end to the beginning of the loop body is now no longer
necessary either, as the processor automatically resets its program counter to the
first loop instruction, unless the iteration count has reached its final value. Hardware
loops have thus no overhead for loop control per loop iteration and only a marginal
Search WWH ::




Custom Search