Graphics Reference
In-Depth Information
Declare pointers and references as restrict . In theory it is sufficient to restrict
output pointers only. In practice, because pointers can be freely assigned to
other pointers through complex expressions sometimes the compiler might not
be able to track the pointer and must consider it lost. In these cases the compiler
has to conservatively assume that two pointers may alias the same target unless
both are qualified with restrict . Thus, restrict-qualifying all pointers simplifies
the alias analysis and is likely to give the best result. It is, however, important
to do so only if there really is no aliasing possible or the compiled code will not
work correctly.
Declare variables as close to actual use as possible (that is, as late as possible). With
less code between declaration point and use, there is less code between to
cause aliasing issues and for the compiler aliasing analysis to fail on. Define
new temporary variables locally instead of reusing earlier temporary variables
that happen to be in scope.
Inline small functions, especially if they take arguments by reference.
Wherever possible, manually extract common subexpressions from if statements and
loops (including the loop condition part). The compiler might not be able to do so,
due to hidden aliasing.
Minimize the abstraction penalty by not over-abstracting and overly generalizing
code .
Very little easily digestable information is available on the issues of aliasing, the
abstraction penalty problem, restricted pointers, and similar topics. Two notable
exceptions are [Robison99] and [Mitchell00].
13.7 Parallelism Through SIMD Optimizations
Today virtually all CPUs have multiple processing units. Commonly a CPU contains
both an arithmetical logical unit (ALU), which performs integer operations, and a
floating-point unit (FPU), which performs floating-point arithmetic. These units tend
to operate in parallel so that the ALU can execute integer instructions at the same time
the FPU is executing floating-point instructions. Some CPUs are n -way superscalar ,
meaning that they have n parallel execution units (generally ALUs) and can start
processing of up to n instructions on each cycle.
Additionally, many current microprocessor architectures offer some type of SIMD
extensions (single instruction, multiple data) to their basic instruction set. These SIMD
instructions are defined on wide registers — holding 4, 8, or even 16 scalar integer
or floating-point values — performing a single operation on these values in parallel.
As such, the use of SIMD instructions promises a potential 4-fold (or 8- or 16-fold)
performance increase for code that can take advantage of this instruction set.
Search WWH ::




Custom Search