Optimization - Real-Time Collision Detection

Graphics Reference

In-Depth Information

These memory optimizations can be divided into two categories: hardware caching

and software caching .

Hardware caching is handled through data and instruction caches built into the

CPU. Such caching happens automatically, but changes to how code and data are laid

out in memory have a drastic effect on its efficiency. Optimizing at this level involves

having a thorough understanding of how the CPU caches work and adapting code

and data structures accordingly.

Software caching takes place at the user level and must be explicitly implemented

and managed in the application, taking into account domain-specific knowledge.

Software caching can be used, for example, to restructure data at runtime to make

the data more spatially coherent or to uncompress data on demand.

A second important optimization is that of utilizing parallelism for both code and

data. Today, many CPUs provide SIMD instructions that operate on several data

streams in parallel. Most CPUs are also superscalar and can fetch and issue multiple

instructions in a single cycle, assuming no interdependencies between instructions.

Code written specifically to take advantage of parallelism can be made to run many

times faster than code in which parallelism is not exploited.

Starting from the basics of CPU cache architecture, this chapter provides a thor-

ough introduction to memory optimization. It explains how to improve low-level

caching through code and data changes and how prefetching and preloading of data

can further improve performance. Practical examples of cache-optimized data struc-

tures are given, and the notion of software caching as an optimization is explored.

The concept of aliasing is introduced and it is explained how aliasing can cause major

performance bottlenecks in performance-sensitive code. Several ways of reducing

aliasing are discussed in detail. The chapter also discusses how taking advantage of

SIMD parallelism can improve the performance of collision code, in particular that of

low-level primitive tests. Last, the (potentially) detrimental effects of branch instruc-

tions on modern CPU instruction pipelines are covered at the end of the chapter,

with suggestions for resolving the problem.

Before proceeding, it is important to stress that optimization should not be per-

formed indiscriminately. Prior to implementing the optimizations described herein,

alternative algorithm approaches should be explored. In general, a change of algo-

rithm can provide much more improvement than fine-tuning. Do not try to intuit

where optimization is needed; always use a code profiler to guide the optimization.

Only when bottlenecks have been identified and appropriate algorithms and data

structures have been selected should fine-tuning be performed. Because the sug-

gested optimizations might not be relevant to a particular architecture, it also makes

sense to measure the efficiency of an optimization through profiling.

Unless routines are written in assembly code, when implementing memory opti-

mizations it is important to have an intuitive feel for what assembly code is generated

for a given piece of high-level code. A compiler might handle memory accesses in a

way that will invalidate fine-tuned optimizations. Subtle changes in the high-level

code can also cause drastic changes in the assembly code. Make it a habit to study

the assembly code at regular intervals.

Real-Time Collision Detection

Search WWH ::

Custom Search

Home