Quaternions Revisited - GPU Pro: Advanced Rendering Techniques - page 373

Graphics Reference

In-Depth Information

inline uint16 Float2fp16 ( float x )

{ uint32 dwFloat = (( uint32 )& x );

uint32 dwMantissa = dwFloat &0 x7fffff ;

int32 iExp =( int )(( dwFloat >> 23) & 0 xff )

−

( int )0 x70 ;

uint32 dwSign = dwFloat >> 31;

int result =( ( dwSign << 15)

|

((( uint32 )( max ( iExp ,0))) << 10)

( dwMantissa >> 13) ) ;

result = result &0 xFFFF ;

return ( uint16 ) result ;

|

}

Listing 1.2. float32 to float16 fast conversion.

large angle step resulted in non-smooth rotation and diculties in aligning static

instanced geometry for map designers.

Finally, we experimented with quaternion packing for skinning. Initially, we

stored a skinning bone 3

3 scale-rotation matrix and position (12 values in

total) in a float32 RGBA texture. Therefore, we needed three vertex texture

fetches per bone and we wasted 25% of the skinning texture for better alignment,

resulting in four float32 RGBA texels per bone. After switching to quaternions,

we used SQT encoding with a uniform scale: this resulted in eight values in

total. That allowed us to store a single bone information only in two texels, thus

making two vertex texture fetches per bone. As we packed skinning values in a

texture, the format of different SQT components had to stay the same. Scale and

transform needed a floating-point format; this is why we picked float16 (a.k.a.

half ). The only issue we tackled in packing was a low speed of a standard DirectX

fp16 packing function, which resulted in significant CPU stalls. To address this,

we used a fast packing method similar to [Mittring 08]. However, we enhanced

this method, making it work for all domain values, unlike the original one. The

resulting code is shown in Listing 1.2.

×

1.9 Comparison

After the transition to quaternions had been done, we made a comparison, shown

in Table 1.2. As could be observed, using quaternions significantly reduces mem-

ory footprint. In the case of normal mapping, the number of nrm , rsq ,and rcp

instructions is also decreased, which provides better performance increase than

one could be expected from the raw ALUs figures. In the case of skinning and

instancing, instruction count increases, but in our experience, ALUs have not

been a bottleneck in vertex shaders.

Next Page

GPU Pro: Advanced Rendering Techniques

Search WWH ::

Custom Search

Home