Graphics Reference
In-Depth Information
Profile
SM Res
First
N Samples Time, Mem,
N Slices
×N Cscd
cascade
ms
MB
Brute force
1024 2 × 6
1
209.6
High qual
1024 2 × 6
1
1024
1024
23.6
60
1024 2 × 5
Balanced
1
512
512
10.35
18
High perf
512 2 × 4
2
512
256
6.19
7
Tab l e 2. 2. Quality settings for Intel HD Graphics 5000 (1280 × 720 resolution).
6
5
4
3
2
1
0
Reconstruct
Camera Z
Render
Coord
Texture
Render
Coarse
Inscattering
Refine
Sampling
Build 1D
Min/max
Trees
Ray March
Interpolate Transform to
Rect Coords
High quality
Balanced
High performance
Figure 2.10. Performance of different stages of the algorithm on Intel HD Graphics
5000 (1280 × 720 resolution).
Memory requirements shown in the tables reflect only the amount of memory
required to store algorithm-specific resources. They do not take into account
shadow map size, back buffer size or camera-space z coordinate texture. The
memory requirements may seem to be high, but if we take into account the fact
that 16-bit float color buffer and 32-bit depth buffer at 2560
1600 resolution
occupy 49 MB, memory consumption starts to look reasonable. Besides, 56 MB
required for balanced profile is less than 3% of 2GB video memory available on
GTX 680.
Timings for the individual steps of the algorithm are given in Figures 2.9
and 2.10. Some minor steps are not shown on the charts. The algorithmic com-
plexity of the first and last steps does not depend on the quality settings and
so is almost constant. Slight variations are caused by different sizes of textures
being accessed from the shaders. Transformation from epipolar to rectangular
coordinates takes
×
0 . 9 ms on NVidia hardware. The ray marching step dom-
inates in the high quality profile, while in the balanced profile it takes about
the same time as final un-warping. In the high performance profile, the time of
the last step dominates, so decreasing quality settings even further will not give
noticeable speed-up (but will save some memory).
On Intel HD graphics 5000 hardware, the picture differs because this GPU has
lower relative memory bandwidth. As a result, constructing 1D min/max binary
trees, a purely bandwidth-limited step, takes more time than ray marching for
 
Search WWH ::




Custom Search