Graphics Reference
In-Depth Information
Clock cycle
1
2
3
4
5
6
7
8
Fetch
I1
I2
I3
I4
I5
I6
Decode
...
I1
I3
I4
I5
Execute
...
...
I1
I3
I4
Store
...
...
...
I1
I3
Figure 13.10 Illustrating the pipeline stalls caused by cancelling the execution of instruction
I2 when the branch instruction I1 is encountered.
out all speculative work) and restarted. For a pipeline with 20 stages, it will take about
20 instructions before all stages of the pipeline are filled and the pipeline is working
at full capacity again.
Avoiding pipeline flushes can greatly improve performance in the worst case. The
best solution is to rewrite code so as to avoid branches in the first place. Improvements
can also be had by rearranging code ( if statements and other conditionals, in
particular) so that control flow becomes more predictable.
Rewriting code to avoid branches often involves using various bit-shifting“tricks”
to compute and operate on Boolean values. It can also involve coercing a compiler to
emit special predicated CPU instructions that are or are not executed depending on
the status of previously computed control flags. Most modern CPUs have predicated
instructions in their instruction sets. Use of predicated instructions differs from com-
piler to compiler, and thus examining the generated code is necessary to understand
under what circumstances they are emitted.
A full treatment of the topic of branch elimination is outside the scope of this topic.
To provide one example of how branches can be eliminated, consider the following
routine for testing if a value is one of the first five prime numbers.
uint32 SmallPrime(uint32 x)
{
if ((x == 2) || (x == 3) || (x == 5) || (x == 7) || (x == 11))
return 1;
return 0;
}
For this code, most compilers will generate code containing about five branches.
For example, Microsoft Visual C for the PC compiles the function to the following
assembly code.
00401150
mov
eax,dword ptr [esp+4]
; fetch x
00401154
cmp
eax,2
; if (x == 2) goto RETURN_1
 
Search WWH ::




Custom Search