Optimization - Real-Time Collision Detection

Graphics Reference

In-Depth Information

Currently available SIMD extensions include SSE and SSE2 from Intel, 3DNow!

from AMD, and AltiVec from Motorola and IBM. Sony's PlayStation 2 takes parallelism

to a new level by offering both multimedia extensions on its MIPS CPU core (featuring

two ALUs and one FPU) and two separate vector co-processors running parallel to

the main CPU, both with parallel instruction streams. Overall, the PlayStation 2 is

capable of executing up to seven different instructions at the same time! New game

consoles like the PlayStation 3 and the Xbox 2 will offer even more parallelism. The

following will focus on the use of SIMD instructions, which can be used to optimize

sequential code in two different ways.

● Instruction-level parallelism. Existing algorithms can be optimized by identifying

similar operations and computing these in parallel. For example, the multipli-

cations in a dot product could be computed in parallel using a single SIMD

multiplication. Because there are serial aspects to most computations, some

parts of a process cannot be optimized using instruction-level parallelism, such

as the summation of the partial products within the dot product.

● Data-level parallelism. SIMD can also be used to operate on completely differ-

ent data in parallel. For example, a typical use could be to compute four dot

products in parallel. Data-level parallelism is the natural operating use of SIMD

instructions, as the SIMD name implies.

An example of a successful application of SIMD optimization is the acceleration

of ray intersections in an interactive ray-tracing application, by testing four rays in

parallel against a triangle using SIMD instructions [Wald01]. To feed the test, four

rays (arranged in a 2

2 cluster) are simultaneously traversed through a k -d tree.

The traversal decision is also made in parallel using SIMD instructions. A subtree is

visited if at least one ray intersects its defining volume. Although some overhead is

caused by this traversal decision, because the 2

×

2 cluster of rays is highly coherent

little extra work is performed in reality. However, this query clustering is not as

applicable to collision detection in general, in which a group of ray queries tend

to span a much wider field. For the SIMD triangle test, a 3.5 to 3.7 time speedup

compared to the non-SIMD C code was reported. The clustered traversal of four rays

in parallel provided an additional speedup of about 2.

Compilers in general remain rather underdeveloped in their support of SIMD

instructions. The support largely consists of providing intrinsic functions (built-in

functions corresponding more or less one-to-one to the assembly instructions).

Effective use of SIMD instructions still largely relies on hand coding in assembly

language.

To illustrate how effective the use of SIMD instructions can be in collision detection

tests, the next three sections outline possible implementations of four simultaneous

sphere-sphere, sphere-AABB, and AABB-AABB tests. It is here assumed the SIMD

architecture can hold four (floating-point) values in the same register and operate

on them simultaneously in a single instruction. SIMD intersection tests for ray-box

×

Real-Time Collision Detection

Search WWH ::

Custom Search

Home