Graphics Reference
In-Depth Information
These memory optimizations can be divided into two categories: hardware caching
and software caching .
Hardware caching is handled through data and instruction caches built into the
CPU. Such caching happens automatically, but changes to how code and data are laid
out in memory have a drastic effect on its efficiency. Optimizing at this level involves
having a thorough understanding of how the CPU caches work and adapting code
and data structures accordingly.
Software caching takes place at the user level and must be explicitly implemented
and managed in the application, taking into account domain-specific knowledge.
Software caching can be used, for example, to restructure data at runtime to make
the data more spatially coherent or to uncompress data on demand.
A second important optimization is that of utilizing parallelism for both code and
data. Today, many CPUs provide SIMD instructions that operate on several data
streams in parallel. Most CPUs are also superscalar and can fetch and issue multiple
instructions in a single cycle, assuming no interdependencies between instructions.
Code written specifically to take advantage of parallelism can be made to run many
times faster than code in which parallelism is not exploited.
Starting from the basics of CPU cache architecture, this chapter provides a thor-
ough introduction to memory optimization. It explains how to improve low-level
caching through code and data changes and how prefetching and preloading of data
can further improve performance. Practical examples of cache-optimized data struc-
tures are given, and the notion of software caching as an optimization is explored.
The concept of aliasing is introduced and it is explained how aliasing can cause major
performance bottlenecks in performance-sensitive code. Several ways of reducing
aliasing are discussed in detail. The chapter also discusses how taking advantage of
SIMD parallelism can improve the performance of collision code, in particular that of
low-level primitive tests. Last, the (potentially) detrimental effects of branch instruc-
tions on modern CPU instruction pipelines are covered at the end of the chapter,
with suggestions for resolving the problem.
Before proceeding, it is important to stress that optimization should not be per-
formed indiscriminately. Prior to implementing the optimizations described herein,
alternative algorithm approaches should be explored. In general, a change of algo-
rithm can provide much more improvement than fine-tuning. Do not try to intuit
where optimization is needed; always use a code profiler to guide the optimization.
Only when bottlenecks have been identified and appropriate algorithms and data
structures have been selected should fine-tuning be performed. Because the sug-
gested optimizations might not be relevant to a particular architecture, it also makes
sense to measure the efficiency of an optimization through profiling.
Unless routines are written in assembly code, when implementing memory opti-
mizations it is important to have an intuitive feel for what assembly code is generated
for a given piece of high-level code. A compiler might handle memory accesses in a
way that will invalidate fine-tuned optimizations. Subtle changes in the high-level
code can also cause drastic changes in the assembly code. Make it a habit to study
the assembly code at regular intervals.
Search WWH ::




Custom Search