Graphics Reference
In-Depth Information
inline uint16 Float2fp16 ( float x )
{ uint32 dwFloat = ￿ (( uint32 ￿ )& x );
uint32 dwMantissa = dwFloat &0 x7fffff ;
int32 iExp =( int )(( dwFloat >> 23) & 0 xff )
( int )0 x70 ;
uint32 dwSign = dwFloat >> 31;
int result =( ( dwSign << 15)
|
((( uint32 )( max ( iExp ,0))) << 10)
( dwMantissa >> 13) ) ;
result = result &0 xFFFF ;
return ( uint16 ) result ;
|
}
Listing 1.2. float32 to float16 fast conversion.
large angle step resulted in non-smooth rotation and diculties in aligning static
instanced geometry for map designers.
Finally, we experimented with quaternion packing for skinning. Initially, we
stored a skinning bone 3
3 scale-rotation matrix and position (12 values in
total) in a float32 RGBA texture. Therefore, we needed three vertex texture
fetches per bone and we wasted 25% of the skinning texture for better alignment,
resulting in four float32 RGBA texels per bone. After switching to quaternions,
we used SQT encoding with a uniform scale: this resulted in eight values in
total. That allowed us to store a single bone information only in two texels, thus
making two vertex texture fetches per bone. As we packed skinning values in a
texture, the format of different SQT components had to stay the same. Scale and
transform needed a floating-point format; this is why we picked float16 (a.k.a.
half ). The only issue we tackled in packing was a low speed of a standard DirectX
fp16 packing function, which resulted in significant CPU stalls. To address this,
we used a fast packing method similar to [Mittring 08]. However, we enhanced
this method, making it work for all domain values, unlike the original one. The
resulting code is shown in Listing 1.2.
×
1.9 Comparison
After the transition to quaternions had been done, we made a comparison, shown
in Table 1.2. As could be observed, using quaternions significantly reduces mem-
ory footprint. In the case of normal mapping, the number of nrm , rsq ,and rcp
instructions is also decreased, which provides better performance increase than
one could be expected from the raw ALUs figures. In the case of skinning and
instancing, instruction count increases, but in our experience, ALUs have not
been a bottleneck in vertex shaders.
Search WWH ::




Custom Search