Optimization - Real-Time Collision Detection

Graphics Reference

In-Depth Information

C

=

(30192, 15771, 9030), D

=

(29971, 15764, 9498), and E

=

(30304, 15888, 9133).

These

could

be

represented

as

(

−

52,

−

9, 121),

(51, 103, 21),

(54,

−

95,

−

234),

(

131) with a common 16-bit offset of

(30138, 15866, 9264). Here, the quantized vertices could be stored using (signed)

9-bit values, a considerable saving over the original format.

Even more space can be saved by allowing the origin to float from one vertex

to the next. That is, the displacement is computed from the previous vertex, thus

effectively implementing a delta compression scheme. Such a scheme works best

when the vertices represent an indexed mesh so that their order can be shuffled to

facilitate a better compression ratio. Rearranging the vertices A through E in the order

D , A , B , E , C gives the first vertex as (29971, 15764, 9498) and the following vertices

as (115, 93,

−

167,

−

102, 234),

and

(166, 22,

−

113), (103, 112,

−

100), (115,

−

81,

−

122), and (

−

112,

−

117,

−

73). The

vertex components can now be represented using (signed) 8-bit values.

Spatial partitioning methods can be combined with storing the data in the leaves

in a quantized format because the range of the data in a leaf will generally only span

a small part of the entire world. Quantization of localized data saves memory yet

allows a full range of coordinate values. If the leaves are cached after decompression,

the extra cost associated with decompression is generally counteracted by amortiza-

tion. Having leaf contents stored relative to an origin corresponding to the spatial

position of the parent node can help increase robustness because computations are

now performed near that origin, maintaining more precision in the calculations.

13.3.3 Prefetching and Preloading

Because stalls caused by load accesses to uncached memory cause severe slowdowns,

most modern CPUs provide mechanisms for helping avoid these stalls. One such

mechanism is the prefetch instruction, which allows the programmer to direct the

CPU to load a given cache line into cache while the CPU continues to execute as

normal. By issuing a prefetch instruction in advance, the data so loaded will be in

cache by the time the program needs it, allowing the CPU to execute at full speed

without stalling. The risk is that if the prefetch instruction is issued too early then

by the time the CPU is ready for the data it might have been flushed from cache.

Conversely, if it is issued too late then there is little benefit to the prefetch. Ideally,

the prefetch instruction should be issued early enough to account for the memory

latency of a load, but not more.

For linear structures, prefetch instructions are both effective and easy to use. For

example, a simple processing loop such as

// Loop through and process all 4n elements

for(inti=0;i<4*n;i++)

Process(elem[i]);

Real-Time Collision Detection

Search WWH ::

Custom Search

Home